Age-related macular degeneration (AMD) is a leading cause of irreversible vision loss, making effective prognosis crucial for timely intervention. In this work, we propose AMD-Mamba, a novel multi-modal framework for AMD prognosis, and further develop a new AMD biomarker. This framework integrates color fundus images with genetic variants and socio-demographic variables. At its core, AMD-Mamba introduces an innovative metric learning strategy that leverages AMD severity scale score as prior knowledge. This strategy allows the model to learn richer feature representations by aligning learned features with clinical phenotypes, thereby improving the capability of conventional prognosis methods in capturing disease progression patterns. In addition, unlike existing models that use traditional CNN backbones and focus primarily on local information, such as the presence of drusen, AMD-Mamba applies Vision Mamba and simultaneously fuses local and long-range global information, such as vascular changes. Furthermore, we enhance prediction performance through multi-scale fusion, combining image information with clinical variables at different resolutions. We evaluate AMD-Mamba on the AREDS dataset, which includes 45,818 color fundus photographs, 52 genetic variants, and 3 socio-demographic variables from 2,741 subjects. Our experimental results demonstrate that our proposed biomarker is one of the most significant biomarkers for the progression of AMD. Notably, combining this biomarker with other existing variables yields promising improvements in detecting high-risk AMD patients at early stages. These findings highlight the potential of our multi-modal framework to facilitate more precise and proactive management of AMD.
{"title":"AMD-Mamba: A Phenotype-Aware Multi-modal Framework for Robust AMD Prognosis.","authors":"Puzhen Wu, Mingquan Lin, Qingyu Chen, Emily Y Chew, Zhiyong Lu, Yifan Peng, Hexin Dong","doi":"10.1007/978-3-032-09513-8_15","DOIUrl":"10.1007/978-3-032-09513-8_15","url":null,"abstract":"<p><p>Age-related macular degeneration (AMD) is a leading cause of irreversible vision loss, making effective prognosis crucial for timely intervention. In this work, we propose AMD-Mamba, a novel multi-modal framework for AMD prognosis, and further develop a new AMD biomarker. This framework integrates color fundus images with genetic variants and socio-demographic variables. At its core, AMD-Mamba introduces an innovative metric learning strategy that leverages AMD severity scale score as prior knowledge. This strategy allows the model to learn richer feature representations by aligning learned features with clinical phenotypes, thereby improving the capability of conventional prognosis methods in capturing disease progression patterns. In addition, unlike existing models that use traditional CNN backbones and focus primarily on local information, such as the presence of drusen, AMD-Mamba applies Vision Mamba and simultaneously fuses local and long-range global information, such as vascular changes. Furthermore, we enhance prediction performance through multi-scale fusion, combining image information with clinical variables at different resolutions. We evaluate AMD-Mamba on the AREDS dataset, which includes 45,818 color fundus photographs, 52 genetic variants, and 3 socio-demographic variables from 2,741 subjects. Our experimental results demonstrate that our proposed biomarker is one of the most significant biomarkers for the progression of AMD. Notably, combining this biomarker with other existing variables yields promising improvements in detecting high-risk AMD patients at early stages. These findings highlight the potential of our multi-modal framework to facilitate more precise and proactive management of AMD.</p>","PeriodicalId":74092,"journal":{"name":"Machine learning in medical imaging. MLMI (Workshop)","volume":"16241 ","pages":"150-160"},"PeriodicalIF":0.0,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12790706/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145960759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2024-10-23DOI: 10.1007/978-3-031-73290-4_8
Pablo Blasco Fernandez, Karthik Gopinath, John Williams-Ramirez, Rogeny Herisse, Lucas J Deden-Binder, Dina Zemlyanker, Theressa Connors, Liana Kozanno, Derek Oakley, Bradley Hyman, Sean I Young, Juan Eugenio Iglesias
Parcellation of mesh models for cortical analysis is a central problem in neuroimaging. Most classical and deep learning methods have requisites in terms of mesh topology, requiring inputs that are homeomorphic to a sphere (i.e., no holes or handles). Topology correction algorithms do exist, but their computational complexity is quadratic with the size of the topological defects - sometimes hours, effectively precluding segmentation of incorrect meshes, including those derived from imperfect segmentations or obtained from inherently noisy modalities like surface scanning. Furthermore, deep learning mesh segmentation also struggles surface scans of brains because they are relatively nondescript and require modeling of longer-range dependencies. Here we propose "pseudo-render-inverse-render" (PRIR), a novel perspective on cortical mesh parcellation that effectively reframes the problem as a 2D segmentation task using an direct-inverse rendering framework. Our approach: (i) renders the mesh from a number of perspectives, projecting the three components of the face normal vectors to a three-channel image; (ii) segments these images with U-Nets; (iii) maps the 2D segmentations back to vertices (inverse rendering); and (iv) aggregates the information from multiple views, postprocessing the output with a Markov Random Field to ensure smoothness and segmentation of occluded areas. PRIR is not affected by mesh topology and easily captures long-range dependencies with the U-Nets. Our results demonstrate: state-of-the-art accuracy on topologically correct white matter meshes; equally accurate performance on simulated surface scans; and robust segmentation of real surface scans.
{"title":"Pseudo-Rendering for Resolution and Topology-Invariant Cortical Parcellation.","authors":"Pablo Blasco Fernandez, Karthik Gopinath, John Williams-Ramirez, Rogeny Herisse, Lucas J Deden-Binder, Dina Zemlyanker, Theressa Connors, Liana Kozanno, Derek Oakley, Bradley Hyman, Sean I Young, Juan Eugenio Iglesias","doi":"10.1007/978-3-031-73290-4_8","DOIUrl":"10.1007/978-3-031-73290-4_8","url":null,"abstract":"<p><p>Parcellation of mesh models for cortical analysis is a central problem in neuroimaging. Most classical and deep learning methods have requisites in terms of mesh topology, requiring inputs that are homeomorphic to a sphere (i.e., no holes or handles). Topology correction algorithms do exist, but their computational complexity is quadratic with the size of the topological defects - sometimes hours, effectively precluding segmentation of incorrect meshes, including those derived from imperfect segmentations or obtained from inherently noisy modalities like surface scanning. Furthermore, deep learning mesh segmentation also struggles surface scans of brains because they are relatively nondescript and require modeling of longer-range dependencies. Here we propose \"pseudo-render-inverse-render\" (PRIR), a novel perspective on cortical mesh parcellation that effectively reframes the problem as a 2D segmentation task using an direct-inverse rendering framework. Our approach: <i>(i)</i> renders the mesh from a number of perspectives, projecting the three components of the face normal vectors to a three-channel image; <i>(ii)</i> segments these images with U-Nets; <i>(iii)</i> maps the 2D segmentations back to vertices (inverse rendering); and <i>(iv)</i> aggregates the information from multiple views, postprocessing the output with a Markov Random Field to ensure smoothness and segmentation of occluded areas. PRIR is not affected by mesh topology and easily captures long-range dependencies with the U-Nets. Our results demonstrate: state-of-the-art accuracy on topologically correct white matter meshes; equally accurate performance on simulated surface scans; and robust segmentation of real surface scans.</p>","PeriodicalId":74092,"journal":{"name":"Machine learning in medical imaging. MLMI (Workshop)","volume":"15242 ","pages":"74-84"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12398396/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144981409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2024-10-23DOI: 10.1007/978-3-031-73290-4_12
Krithika Iyer, Shireen Y Elhabian
The study of physiology demonstrates that the form (shape) of anatomical structures dictates their functions, and analyzing the form of anatomies plays a crucial role in clinical research. Statistical shape modeling (SSM) is a widely used tool for quantitative analysis of forms of anatomies, aiding in characterizing and identifying differences within a population of subjects. Despite its utility, the conventional SSM construction pipeline is often complex and time-consuming. Additionally, reliance on linearity assumptions further limits the model from capturing clinically relevant variations. Recent advancements in deep learning solutions enable the direct inference of SSM from unsegmented medical images, streamlining the process and improving accessibility. However, the new methods of SSM from images do not adequately account for situations where the imaging data quality is poor or where only sparse information is available. Moreover, quantifying aleatoric uncertainty, which represents inherent data variability, is crucial in deploying deep learning for clinical tasks to ensure reliable model predictions and robust decision-making, especially in challenging imaging conditions. Therefore, we propose SPI-CorrNet, a unified model that predicts 3D correspondences from sparse imaging data. It leverages a teacher network to regularize feature learning and quantifies data-dependent aleatoric uncertainty by adapting the network to predict intrinsic input variances. Experiments on the LGE MRI left atrium dataset and Abdomen CT-1K liver datasets demonstrate that our technique enhances the accuracy and robustness of sparse image-driven SSM.
{"title":"Probabilistic 3D Correspondence Prediction from Sparse Unsegmented Images.","authors":"Krithika Iyer, Shireen Y Elhabian","doi":"10.1007/978-3-031-73290-4_12","DOIUrl":"10.1007/978-3-031-73290-4_12","url":null,"abstract":"<p><p>The study of physiology demonstrates that the form (shape) of anatomical structures dictates their functions, and analyzing the form of anatomies plays a crucial role in clinical research. Statistical shape modeling (SSM) is a widely used tool for quantitative analysis of forms of anatomies, aiding in characterizing and identifying differences within a population of subjects. Despite its utility, the conventional SSM construction pipeline is often complex and time-consuming. Additionally, reliance on linearity assumptions further limits the model from capturing clinically relevant variations. Recent advancements in deep learning solutions enable the direct inference of SSM from unsegmented medical images, streamlining the process and improving accessibility. However, the new methods of SSM from images do not adequately account for situations where the imaging data quality is poor or where only sparse information is available. Moreover, quantifying aleatoric uncertainty, which represents inherent data variability, is crucial in deploying deep learning for clinical tasks to ensure reliable model predictions and robust decision-making, especially in challenging imaging conditions. Therefore, we propose SPI-CorrNet, a unified model that predicts 3D correspondences from sparse imaging data. It leverages a teacher network to regularize feature learning and quantifies data-dependent aleatoric uncertainty by adapting the network to predict intrinsic input variances. Experiments on the LGE MRI left atrium dataset and Abdomen CT-1K liver datasets demonstrate that our technique enhances the accuracy and robustness of sparse image-driven SSM.</p>","PeriodicalId":74092,"journal":{"name":"Machine learning in medical imaging. MLMI (Workshop)","volume":"15242 ","pages":"117-127"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11568407/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142649865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-01Epub Date: 2023-10-15DOI: 10.1007/978-3-031-45673-2_5
Cheng Che Tsai, Xiaoyang Chen, Sahar Ahmad, Pew-Thian Yap
Magnetic resonance imaging (MRI) is commonly used for studying infant brain development. However, due to the lengthy image acquisition time and limited subject compliance, high-quality infant MRI can be challenging. Without imposing additional burden on image acquisition, image super-resolution (SR) can be used to enhance image quality post-acquisition. Most SR techniques are supervised and trained on multiple aligned low-resolution (LR) and high-resolution (HR) image pairs, which in practice are not usually available. Unlike supervised approaches, Deep Image Prior (DIP) can be employed for unsupervised single-image SR, utilizing solely the input LR image for de novo optimization to produce an HR image. However, determining when to stop early in DIP training is non-trivial and presents a challenge to fully automating the SR process. To address this issue, we constrain the low-frequency k-space of the SR image to be similar to that of the LR image. We further improve performance by designing a dual-modal framework that leverages shared anatomical information between T1-weighted and T2-weighted images. We evaluated our model, dual-modal DIP (dmDIP), on infant MRI data acquired from birth to one year of age, demonstrating that enhanced image quality can be obtained with substantially reduced sensitivity to early stopping.
磁共振成像(MRI)通常用于研究婴儿的大脑发育。然而,由于图像采集时间长、受试者服从性有限,高质量的婴儿磁共振成像具有挑战性。在不增加图像采集负担的情况下,图像超分辨率(SR)可用于提高采集后的图像质量。大多数超分辨率技术都是有监督的,并在多个对齐的低分辨率(LR)和高分辨率(HR)图像对上进行训练,但实际上通常无法获得这些图像对。与有监督的方法不同,深度图像优先(DIP)可用于无监督的单图像 SR,仅利用输入的低分辨率图像进行全新优化,生成高分辨率图像。然而,确定何时停止 DIP 训练并非易事,这对 SR 过程的完全自动化提出了挑战。为了解决这个问题,我们限制 SR 图像的低频 k 空间与 LR 图像的低频 k 空间相似。我们通过设计双模态框架,利用 T1 加权和 T2 加权图像之间共享的解剖信息,进一步提高了性能。我们在从出生到一岁的婴儿磁共振成像数据上评估了我们的模型--双模态 DIP(dmDIP),结果表明,在大幅降低对早期停跳敏感性的同时,还能获得更高的图像质量。
{"title":"Robust Unsupervised Super-Resolution of Infant MRI via Dual-Modal Deep Image Prior.","authors":"Cheng Che Tsai, Xiaoyang Chen, Sahar Ahmad, Pew-Thian Yap","doi":"10.1007/978-3-031-45673-2_5","DOIUrl":"10.1007/978-3-031-45673-2_5","url":null,"abstract":"<p><p>Magnetic resonance imaging (MRI) is commonly used for studying infant brain development. However, due to the lengthy image acquisition time and limited subject compliance, high-quality infant MRI can be challenging. Without imposing additional burden on image acquisition, image super-resolution (SR) can be used to enhance image quality post-acquisition. Most SR techniques are supervised and trained on multiple aligned low-resolution (LR) and high-resolution (HR) image pairs, which in practice are not usually available. Unlike supervised approaches, Deep Image Prior (DIP) can be employed for unsupervised single-image SR, utilizing solely the input LR image for <i>de novo</i> optimization to produce an HR image. However, determining when to stop early in DIP training is non-trivial and presents a challenge to fully automating the SR process. To address this issue, we constrain the low-frequency k-space of the SR image to be similar to that of the LR image. We further improve performance by designing a dual-modal framework that leverages shared anatomical information between T1-weighted and T2-weighted images. We evaluated our model, dual-modal DIP (dmDIP), on infant MRI data acquired from birth to one year of age, demonstrating that enhanced image quality can be obtained with substantially reduced sensitivity to early stopping.</p>","PeriodicalId":74092,"journal":{"name":"Machine learning in medical imaging. MLMI (Workshop)","volume":"14348 ","pages":"42-51"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11323077/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141989667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The synergy of long-range dependencies from transformers and local representations of image content from convolutional neural networks (CNNs) has led to advanced architectures and increased performance for various medical image analysis tasks due to their complementary benefits. However, compared with CNNs, transformers require considerably more training data, due to a larger number of parameters and an absence of inductive bias. The need for increasingly large datasets continues to be problematic, particularly in the context of medical imaging, where both annotation efforts and data protection result in limited data availability. In this work, inspired by the human decision-making process of correlating new "evidence" with previously memorized "experience", we propose a Memorizing Vision Transformer (MoViT) to alleviate the need for large-scale datasets to successfully train and deploy transformer-based architectures. MoViT leverages an external memory structure to cache history attention snapshots during the training stage. To prevent overfitting, we incorporate an innovative memory update scheme, attention temporal moving average, to update the stored external memories with the historical moving average. For inference speedup, we design a prototypical attention learning method to distill the external memory into smaller representative subsets. We evaluate our method on a public histology image dataset and an in-house MRI dataset, demonstrating that MoViT applied to varied medical image analysis tasks, can outperform vanilla transformer models across varied data regimes, especially in cases where only a small amount of annotated data is available. More importantly, MoViT can reach a competitive performance of ViT with only 3.0% of the training data. In conclusion, MoViT provides a simple plug-in for transformer architectures which may contribute to reducing the training data needed to achieve acceptable models for a broad range of medical image analysis tasks.
{"title":"MoViT: Memorizing Vision Transformers for Medical Image Analysis.","authors":"Yiqing Shen, Pengfei Guo, Jingpu Wu, Qianqi Huang, Nhat Le, Jinyuan Zhou, Shanshan Jiang, Mathias Unberath","doi":"10.1007/978-3-031-45676-3_21","DOIUrl":"10.1007/978-3-031-45676-3_21","url":null,"abstract":"<p><p>The synergy of long-range dependencies from transformers and local representations of image content from convolutional neural networks (CNNs) has led to advanced architectures and increased performance for various medical image analysis tasks due to their complementary benefits. However, compared with CNNs, transformers require considerably more training data, due to a larger number of parameters and an absence of inductive bias. The need for increasingly large datasets continues to be problematic, particularly in the context of medical imaging, where both annotation efforts and data protection result in limited data availability. In this work, inspired by the human decision-making process of correlating new \"evidence\" with previously memorized \"experience\", we propose a Memorizing Vision Transformer (MoViT) to alleviate the need for large-scale datasets to successfully train and deploy transformer-based architectures. MoViT leverages an external memory structure to cache history attention snapshots during the training stage. To prevent overfitting, we incorporate an innovative memory update scheme, attention temporal moving average, to update the stored external memories with the historical moving average. For inference speedup, we design a prototypical attention learning method to distill the external memory into smaller representative subsets. We evaluate our method on a public histology image dataset and an in-house MRI dataset, demonstrating that MoViT applied to varied medical image analysis tasks, can outperform vanilla transformer models across varied data regimes, especially in cases where only a small amount of annotated data is available. More importantly, MoViT can reach a competitive performance of ViT with only 3.0% of the training data. In conclusion, MoViT provides a simple plug-in for transformer architectures which may contribute to reducing the training data needed to achieve acceptable models for a broad range of medical image analysis tasks.</p>","PeriodicalId":74092,"journal":{"name":"Machine learning in medical imaging. MLMI (Workshop)","volume":"14349 ","pages":"205-213"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11008051/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140863811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-01Epub Date: 2023-10-15DOI: 10.1007/978-3-031-45676-3_15
Boning Tong, Zhuoping Zhou, Davoud Ataee Tarzanagh, Bojian Hou, Andrew J Saykin, Jason Moore, Marylyn Ritchie, Li Shen
Alzheimer's disease (AD) leads to irreversible cognitive decline, with Mild Cognitive Impairment (MCI) as its prodromal stage. Early detection of AD and related dementia is crucial for timely treatment and slowing disease progression. However, classifying cognitive normal (CN), MCI, and AD subjects using machine learning models faces class imbalance, necessitating the use of balanced accuracy as a suitable metric. To enhance model performance and balanced accuracy, we introduce a novel method called VS-Opt-Net. This approach incorporates the recently developed vector-scaling (VS) loss into a machine learning pipeline named STREAMLINE. Moreover, it employs Bayesian optimization for hyperparameter learning of both the model and loss function. VS-Opt-Net not only amplifies the contribution of minority examples in proportion to the imbalance level but also addresses the challenge of generalization in training deep networks. In our empirical study, we use MRI-based brain regional measurements as features to conduct the CN vs MCI and AD vs MCI binary classifications. We compare the balanced accuracy of our model with other machine learning models and deep neural network loss functions that also employ class-balanced strategies. Our findings demonstrate that after hyperparameter optimization, the deep neural network using the VS loss function substantially improves balanced accuracy. It also surpasses other models in performance on the AD dataset. Moreover, our feature importance analysis highlights VS-Opt-Net's ability to elucidate biomarker differences across dementia stages.
阿尔茨海默病(AD)会导致不可逆的认知能力下降,轻度认知障碍(MCI)是其前驱阶段。早期发现阿尔茨海默病和相关痴呆症对于及时治疗和延缓疾病进展至关重要。然而,使用机器学习模型对认知正常(CN)、MCI 和 AD 受试者进行分类面临着类别不平衡的问题,因此有必要使用平衡准确性作为合适的衡量标准。为了提高模型性能和平衡准确性,我们引入了一种名为 VS-Opt-Net 的新方法。这种方法将最近开发的向量缩放(VS)损失纳入名为 STREAMLINE 的机器学习管道中。此外,它还采用贝叶斯优化方法对模型和损失函数进行超参数学习。VS-Opt-Net 不仅能根据不平衡程度放大少数实例的贡献,还能解决深度网络训练中的泛化难题。在实证研究中,我们使用基于 MRI 的大脑区域测量作为特征,进行 CN vs MCI 和 AD vs MCI 的二元分类。我们将模型的平衡准确性与其他同样采用类平衡策略的机器学习模型和深度神经网络损失函数进行了比较。我们的研究结果表明,经过超参数优化后,使用 VS 损失函数的深度神经网络大大提高了均衡准确率。它在 AD 数据集上的表现也超过了其他模型。此外,我们的特征重要性分析突出了 VS-Opt-Net 在阐明不同痴呆症阶段的生物标记物差异方面的能力。
{"title":"Class-Balanced Deep Learning with Adaptive Vector Scaling Loss for Dementia Stage Detection.","authors":"Boning Tong, Zhuoping Zhou, Davoud Ataee Tarzanagh, Bojian Hou, Andrew J Saykin, Jason Moore, Marylyn Ritchie, Li Shen","doi":"10.1007/978-3-031-45676-3_15","DOIUrl":"10.1007/978-3-031-45676-3_15","url":null,"abstract":"<p><p>Alzheimer's disease (AD) leads to irreversible cognitive decline, with Mild Cognitive Impairment (MCI) as its prodromal stage. Early detection of AD and related dementia is crucial for timely treatment and slowing disease progression. However, classifying cognitive normal (CN), MCI, and AD subjects using machine learning models faces class imbalance, necessitating the use of balanced accuracy as a suitable metric. To enhance model performance and balanced accuracy, we introduce a novel method called VS-Opt-Net. This approach incorporates the recently developed vector-scaling (VS) loss into a machine learning pipeline named STREAMLINE. Moreover, it employs Bayesian optimization for hyperparameter learning of both the model and loss function. VS-Opt-Net not only amplifies the contribution of minority examples in proportion to the imbalance level but also addresses the challenge of generalization in training deep networks. In our empirical study, we use MRI-based brain regional measurements as features to conduct the CN vs MCI and AD vs MCI binary classifications. We compare the balanced accuracy of our model with other machine learning models and deep neural network loss functions that also employ class-balanced strategies. Our findings demonstrate that after hyperparameter optimization, the deep neural network using the VS loss function substantially improves balanced accuracy. It also surpasses other models in performance on the AD dataset. Moreover, our feature importance analysis highlights VS-Opt-Net's ability to elucidate biomarker differences across dementia stages.</p>","PeriodicalId":74092,"journal":{"name":"Machine learning in medical imaging. MLMI (Workshop)","volume":"14349 ","pages":"144-154"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10924683/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140095306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-01Epub Date: 2023-10-15DOI: 10.1007/978-3-031-45676-3_40
Erkun Yang, Cheng Deng, Mingxia Liu
Neuroimage retrieval plays a crucial role in providing physicians with access to previous similar cases, which is essential for case-based reasoning and evidence-based medicine. Due to low computation and storage costs, hashing-based search techniques have been widely adopted for establishing image retrieval systems. However, these methods often suffer from nonnegligible quantization loss, which can degrade the overall search performance. To address this issue, this paper presents a compact coding solution namely Deep Bayesian Quantization (DBQ), which focuses on deep compact quantization that can estimate continuous neuroimage representations and achieve superior performance over existing hashing solutions. Specifically, DBQ seamlessly combines the deep representation learning and the representation compact quantization within a novel Bayesian learning framework, where a proxy embedding-based likelihood function is developed to alleviate the sampling issue for traditional similarity supervision. Additionally, a Gaussian prior is employed to reduce the quantization losses. By utilizing pre-computed lookup tables, the proposed DBQ can enable efficient and effective similarity search. Extensive experiments conducted on 2, 008 structural MRI scans from three benchmark neuroimage datasets demonstrate that our method outperforms previous state-of-the-arts.
{"title":"Deep Bayesian Quantization for Supervised Neuroimage Search.","authors":"Erkun Yang, Cheng Deng, Mingxia Liu","doi":"10.1007/978-3-031-45676-3_40","DOIUrl":"10.1007/978-3-031-45676-3_40","url":null,"abstract":"<p><p>Neuroimage retrieval plays a crucial role in providing physicians with access to previous similar cases, which is essential for case-based reasoning and evidence-based medicine. Due to low computation and storage costs, hashing-based search techniques have been widely adopted for establishing image retrieval systems. However, these methods often suffer from nonnegligible quantization loss, which can degrade the overall search performance. To address this issue, this paper presents a compact coding solution namely <i>Deep Bayesian Quantization</i> (DBQ), which focuses on deep compact quantization that can estimate continuous neuroimage representations and achieve superior performance over existing hashing solutions. Specifically, DBQ seamlessly combines the deep representation learning and the representation compact quantization within a novel Bayesian learning framework, where a proxy embedding-based likelihood function is developed to alleviate the sampling issue for traditional similarity supervision. Additionally, a Gaussian prior is employed to reduce the quantization losses. By utilizing pre-computed lookup tables, the proposed DBQ can enable efficient and effective similarity search. Extensive experiments conducted on 2, 008 structural MRI scans from three benchmark neuroimage datasets demonstrate that our method outperforms previous state-of-the-arts.</p>","PeriodicalId":74092,"journal":{"name":"Machine learning in medical imaging. MLMI (Workshop)","volume":"14349 ","pages":"396-406"},"PeriodicalIF":0.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10883338/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139934532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-01Epub Date: 2023-10-15DOI: 10.1007/978-3-031-45673-2_38
Anees Kazi, Soroush Farghadani, Iman Aganj, Nassir Navab
Interpretability in Graph Convolutional Networks (GCNs) has been explored to some extent in general in computer vision; yet, in the medical domain, it requires further examination. Most of the interpretability approaches for GCNs, especially in the medical domain, focus on interpreting the output of the model in a post-hoc fashion. In this paper, we propose an interpretable attention module (IAM) that explains the relevance of the input features to the classification task on a GNN Model. The model uses these interpretations to improve its performance. In a clinical scenario, such a model can assist the clinical experts in better decision-making for diagnosis and treatment planning. The main novelty lies in the IAM, which directly operates on input features. IAM learns the attention for each feature based on the unique interpretability-specific losses. We show the application of our model on two publicly available datasets, Tadpole and the UK Biobank (UKBB). For Tadpole we choose the task of disease classification, and for UKBB, age, and sex prediction. The proposed model achieves an increase in an average accuracy of 3.2% for Tadpole and 1.6% for UKBB sex and 2% for the UKBB age prediction task compared to the state-of-the-art. Further, we show exhaustive validation and clinical interpretation of our results.
{"title":"IA-GCN: Interpretable Attention based Graph Convolutional Network for Disease Prediction.","authors":"Anees Kazi, Soroush Farghadani, Iman Aganj, Nassir Navab","doi":"10.1007/978-3-031-45673-2_38","DOIUrl":"https://doi.org/10.1007/978-3-031-45673-2_38","url":null,"abstract":"<p><p>Interpretability in Graph Convolutional Networks (GCNs) has been explored to some extent in general in computer vision; yet, in the medical domain, it requires further examination. Most of the interpretability approaches for GCNs, especially in the medical domain, focus on interpreting the output of the model in a <i>post</i>-<i>hoc</i> fashion. In this paper, we propose an interpretable attention module (IAM) that explains the relevance of the input features to the classification task on a GNN Model. The model uses these interpretations to improve its performance. In a clinical scenario, such a model can assist the clinical experts in better decision-making for diagnosis and treatment planning. The main novelty lies in the IAM, which directly operates on input features. IAM learns the attention for each feature based on the unique interpretability-specific losses. We show the application of our model on two publicly available datasets, Tadpole and the UK Biobank (UKBB). For Tadpole we choose the task of disease classification, and for UKBB, age, and sex prediction. The proposed model achieves an increase in an average accuracy of 3.2% for Tadpole and 1.6% for UKBB sex and 2% for the UKBB age prediction task compared to the state-of-the-art. Further, we show exhaustive validation and clinical interpretation of our results.</p>","PeriodicalId":74092,"journal":{"name":"Machine learning in medical imaging. MLMI (Workshop)","volume":"14348 ","pages":"382-392"},"PeriodicalIF":0.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10583839/pdf/nihms-1932968.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49685875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-01Epub Date: 2023-10-15DOI: 10.1007/978-3-031-45676-3_14
Lanhong Yao, Zheyuan Zhang, Ugur Demir, Elif Keles, Camila Vendrami, Emil Agarunov, Candice Bolan, Ivo Schoots, Marc Bruno, Rajesh Keswani, Frank Miller, Tamas Gonda, Cemal Yazici, Temel Tirkes, Michael Wallace, Concetto Spampinato, Ulas Bagci
Intraductal Papillary Mucinous Neoplasm (IPMN) cysts are pre-malignant pancreas lesions, and they can progress into pancreatic cancer. Therefore, detecting and stratifying their risk level is of ultimate importance for effective treatment planning and disease control. However, this is a highly challenging task because of the diverse and irregular shape, texture, and size of the IPMN cysts as well as the pancreas. In this study, we propose a novel computer-aided diagnosis pipeline for IPMN risk classification from multi-contrast MRI scans. Our proposed analysis framework includes an efficient volumetric self-adapting segmentation strategy for pancreas delineation, followed by a newly designed deep learning-based classification scheme with a radiomics-based predictive approach. We test our proposed decision-fusion model in multi-center data sets of 246 multi-contrast MRI scans and obtain superior performance to the state of the art (SOTA) in this field. Our ablation studies demonstrate the significance of both radiomics and deep learning modules for achieving the new SOTA performance compared to international guidelines and published studies (81.9% vs 61.3% in accuracy). Our findings have important implications for clinical decision-making. In a series of rigorous experiments on multi-center data sets (246 MRI scans from five centers), we achieved unprecedented performance (81.9% accuracy). The code is available upon publication.
{"title":"Radiomics Boosts Deep Learning Model for IPMN Classification.","authors":"Lanhong Yao, Zheyuan Zhang, Ugur Demir, Elif Keles, Camila Vendrami, Emil Agarunov, Candice Bolan, Ivo Schoots, Marc Bruno, Rajesh Keswani, Frank Miller, Tamas Gonda, Cemal Yazici, Temel Tirkes, Michael Wallace, Concetto Spampinato, Ulas Bagci","doi":"10.1007/978-3-031-45676-3_14","DOIUrl":"10.1007/978-3-031-45676-3_14","url":null,"abstract":"<p><p>Intraductal Papillary Mucinous Neoplasm (IPMN) cysts are pre-malignant pancreas lesions, and they can progress into pancreatic cancer. Therefore, detecting and stratifying their risk level is of ultimate importance for effective treatment planning and disease control. However, this is a highly challenging task because of the diverse and irregular shape, texture, and size of the IPMN cysts as well as the pancreas. In this study, we propose a novel computer-aided diagnosis pipeline for IPMN risk classification from multi-contrast MRI scans. Our proposed analysis framework includes an efficient volumetric self-adapting segmentation strategy for pancreas delineation, followed by a newly designed deep learning-based classification scheme with a radiomics-based predictive approach. We test our proposed decision-fusion model in multi-center data sets of 246 multi-contrast MRI scans and obtain superior performance to the state of the art (SOTA) in this field. Our ablation studies demonstrate the significance of both radiomics and deep learning modules for achieving the new SOTA performance compared to international guidelines and published studies (81.9% vs 61.3% in accuracy). Our findings have important implications for clinical decision-making. In a series of rigorous experiments on multi-center data sets (246 MRI scans from five centers), we achieved unprecedented performance (81.9% accuracy). The code is available upon publication.</p>","PeriodicalId":74092,"journal":{"name":"Machine learning in medical imaging. MLMI (Workshop)","volume":"14349 ","pages":"134-143"},"PeriodicalIF":0.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10810260/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139563708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multi-site brain magnetic resonance imaging (MRI) has been widely used in clinical and research domains, but usually is sensitive to non-biological variations caused by site effects (e.g., field strengths and scanning protocols). Several retrospective data harmonization methods have shown promising results in removing these non-biological variations at feature or whole-image level. Most existing image-level harmonization methods are implemented through generative adversarial networks, which are generally computationally expensive and generalize poorly on independent data. To this end, this paper proposes a disentangled latent energy-based style translation (DLEST) framework for image-level structural MRI harmonization. Specifically, DLEST disentangles site-invariant image generation and site-specific style translation via a latent autoencoder and an energy-based model. The autoencoder learns to encode images into low-dimensional latent space, and generates faithful images from latent codes. The energy-based model is placed in between the encoding and generation steps, facilitating style translation from a source domain to a target domain implicitly. This allows highly generalizable image generation and efficient style translation through the latent space. We train our model on 4,092 T1-weighted MRIs in 3 tasks: histogram comparison, acquisition site classification, and brain tissue segmentation. Qualitative and quantitative results demonstrate the superiority of our approach, which generally outperforms several state-of-the-art methods.
{"title":"Structural MRI Harmonization via Disentangled Latent Energy-Based Style Translation.","authors":"Mengqi Wu, Lintao Zhang, Pew-Thian Yap, Weili Lin, Hongtu Zhu, Mingxia Liu","doi":"10.1007/978-3-031-45673-2_1","DOIUrl":"10.1007/978-3-031-45673-2_1","url":null,"abstract":"<p><p>Multi-site brain magnetic resonance imaging (MRI) has been widely used in clinical and research domains, but usually is sensitive to non-biological variations caused by site effects (<i>e.g.</i>, field strengths and scanning protocols). Several retrospective data harmonization methods have shown promising results in removing these non-biological variations at feature or whole-image level. Most existing image-level harmonization methods are implemented through generative adversarial networks, which are generally computationally expensive and generalize poorly on independent data. To this end, this paper proposes a disentangled latent energy-based style translation (DLEST) framework for image-level structural MRI harmonization. Specifically, DLEST disentangles <i>site-invariant image generation</i> and <i>site-specific style translation</i> via a latent autoencoder and an energy-based model. The autoencoder learns to encode images into low-dimensional latent space, and generates faithful images from latent codes. The energy-based model is placed in between the encoding and generation steps, facilitating style translation from a source domain to a target domain implicitly. This allows <i>highly generalizable image generation and efficient style translation</i> through the latent space. We train our model on 4,092 T1-weighted MRIs in 3 tasks: histogram comparison, acquisition site classification, and brain tissue segmentation. Qualitative and quantitative results demonstrate the superiority of our approach, which generally outperforms several state-of-the-art methods.</p>","PeriodicalId":74092,"journal":{"name":"Machine learning in medical imaging. MLMI (Workshop)","volume":"14348 ","pages":"1-11"},"PeriodicalIF":0.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10883146/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139934531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}