Pub Date : 2024-04-24DOI: 10.1016/j.compmedimag.2024.102387
Chunsu Park , Jeong-Woon Kang , Doen-Eon Lee , Wookon Son , Sang-Min Lee , Chankue Park , MinWoo Kim
Dual-energy computed tomography (CT) is an excellent substitute for identifying bone marrow edema in magnetic resonance imaging. However, it is rarely used in practice owing to its low contrast. To overcome this problem, we constructed a framework based on deep learning techniques to screen for diseases using axial bone images and to identify the local positions of bone lesions. To address the limited availability of labeled samples, we developed a new generative adversarial network (GAN) that extends expressions beyond conventional augmentation (CA) methods based on geometric transformations. We theoretically and experimentally determined that combining the concepts of data augmentation optimized for GAN training (DAG) and Wasserstein GAN yields a considerably stable generation of synthetic images and effectively aligns their distribution with that of real images, thereby achieving a high degree of similarity. The classification model was trained using real and synthetic samples. Consequently, the GAN technique used in the diagnostic test had an improved F1 score of approximately 7.8% compared with CA. The final F1 score was 80.24%, and the recall and precision were 84.3% and 88.7%, respectively. The results obtained using the augmented samples outperformed those obtained using pure real samples without augmentation. In addition, we adopted explainable AI techniques that leverage a class activation map (CAM) and principal component analysis to facilitate visual analysis of the network’s results. The framework was designed to suggest an attention map and scattering plot to visually explain the disease predictions of the network.
双能计算机断层扫描(CT)是在磁共振成像中识别骨髓水肿的最佳替代方法。然而,由于其对比度低,在实践中很少使用。为了克服这一问题,我们构建了一个基于深度学习技术的框架,利用轴向骨骼图像筛查疾病,并识别骨骼病变的局部位置。为了解决标注样本有限的问题,我们开发了一种新的生成对抗网络(GAN),其表达方式超越了基于几何变换的传统增强(CA)方法。我们从理论和实验上确定,将针对 GAN 训练进行优化的数据增强(DAG)和 Wasserstein GAN 的概念相结合,可以生成相当稳定的合成图像,并有效地将其分布与真实图像的分布相一致,从而实现高度相似。分类模型使用真实样本和合成样本进行训练。因此,与 CA 相比,诊断测试中使用的 GAN 技术的 F1 分数提高了约 7.8%。最终的 F1 得分为 80.24%,召回率和精确率分别为 84.3% 和 88.7%。使用增强样本所获得的结果优于使用纯真实样本(无增强)所获得的结果。此外,我们还采用了可解释的人工智能技术,利用类激活图(CAM)和主成分分析来促进对网络结果的可视化分析。该框架旨在通过注意力图和散点图来直观地解释网络的疾病预测结果。
{"title":"W-DRAG: A joint framework of WGAN with data random augmentation optimized for generative networks for bone marrow edema detection in dual energy CT","authors":"Chunsu Park , Jeong-Woon Kang , Doen-Eon Lee , Wookon Son , Sang-Min Lee , Chankue Park , MinWoo Kim","doi":"10.1016/j.compmedimag.2024.102387","DOIUrl":"10.1016/j.compmedimag.2024.102387","url":null,"abstract":"<div><p>Dual-energy computed tomography (CT) is an excellent substitute for identifying bone marrow edema in magnetic resonance imaging. However, it is rarely used in practice owing to its low contrast. To overcome this problem, we constructed a framework based on deep learning techniques to screen for diseases using axial bone images and to identify the local positions of bone lesions. To address the limited availability of labeled samples, we developed a new generative adversarial network (GAN) that extends expressions beyond conventional augmentation (CA) methods based on geometric transformations. We theoretically and experimentally determined that combining the concepts of data augmentation optimized for GAN training (DAG) and Wasserstein GAN yields a considerably stable generation of synthetic images and effectively aligns their distribution with that of real images, thereby achieving a high degree of similarity. The classification model was trained using real and synthetic samples. Consequently, the GAN technique used in the diagnostic test had an improved F1 score of approximately 7.8% compared with CA. The final F1 score was 80.24%, and the recall and precision were 84.3% and 88.7%, respectively. The results obtained using the augmented samples outperformed those obtained using pure real samples without augmentation. In addition, we adopted explainable AI techniques that leverage a class activation map (CAM) and principal component analysis to facilitate visual analysis of the network’s results. The framework was designed to suggest an attention map and scattering plot to visually explain the disease predictions of the network.</p></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":null,"pages":null},"PeriodicalIF":5.7,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0895611124000648/pdfft?md5=340b576800836a42ff054a8829a2c44e&pid=1-s2.0-S0895611124000648-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140784586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-19DOI: 10.1016/j.compmedimag.2024.102386
Md Navid Akbar , Sebastian F. Ruf , Ashutosh Singh , Razieh Faghihpirayesh , Rachael Garner , Alexis Bennett , Celina Alba , Marianna La Rocca , Tales Imbiriba , Deniz Erdoğmuş , Dominique Duncan
A late post-traumatic seizure (LPTS), a consequence of traumatic brain injury (TBI), can potentially evolve into a lifelong condition known as post-traumatic epilepsy (PTE). Presently, the mechanism that triggers epileptogenesis in TBI patients remains elusive, inspiring the epilepsy community to devise ways to predict which TBI patients will develop PTE and to identify potential biomarkers. In response to this need, our study collected comprehensive, longitudinal multimodal data from 48 TBI patients across multiple participating institutions. A supervised binary classification task was created, contrasting data from LPTS patients with those without LPTS. To accommodate missing modalities in some subjects, we took a two-pronged approach. Firstly, we extended a graphical model-based Bayesian estimator to directly classify subjects with incomplete modality. Secondly, we explored conventional imputation techniques. The imputed multimodal information was then combined, following several fusion and dimensionality reduction techniques found in the literature, and subsequently fitted to a kernel- or a tree-based classifier. For this fusion, we proposed two new algorithms: recursive elimination of correlated components (RECC) that filters information based on the correlation between the already selected features, and information decomposition and selective fusion (IDSF), which effectively recombines information from decomposed multimodal features. Our cross-validation findings showed that the proposed IDSF algorithm delivers superior performance based on the area under the curve (AUC) score. Ultimately, after rigorous statistical comparisons and interpretable machine learning examination using Shapley values of the most frequently selected features, we recommend the two following magnetic resonance imaging (MRI) abnormalities as potential biomarkers: the left anterior limb of internal capsule in diffusion MRI (dMRI), and the right middle temporal gyrus in functional MRI (fMRI).
{"title":"Advancing post-traumatic seizure classification and biomarker identification: Information decomposition based multimodal fusion and explainable machine learning with missing neuroimaging data","authors":"Md Navid Akbar , Sebastian F. Ruf , Ashutosh Singh , Razieh Faghihpirayesh , Rachael Garner , Alexis Bennett , Celina Alba , Marianna La Rocca , Tales Imbiriba , Deniz Erdoğmuş , Dominique Duncan","doi":"10.1016/j.compmedimag.2024.102386","DOIUrl":"10.1016/j.compmedimag.2024.102386","url":null,"abstract":"<div><p>A late post-traumatic seizure (LPTS), a consequence of traumatic brain injury (TBI), can potentially evolve into a lifelong condition known as post-traumatic epilepsy (PTE). Presently, the mechanism that triggers epileptogenesis in TBI patients remains elusive, inspiring the epilepsy community to devise ways to predict which TBI patients will develop PTE and to identify potential biomarkers. In response to this need, our study collected comprehensive, longitudinal multimodal data from 48 TBI patients across multiple participating institutions. A supervised binary classification task was created, contrasting data from LPTS patients with those without LPTS. To accommodate missing modalities in some subjects, we took a two-pronged approach. Firstly, we extended a graphical model-based Bayesian estimator to directly classify subjects with incomplete modality. Secondly, we explored conventional imputation techniques. The imputed multimodal information was then combined, following several fusion and dimensionality reduction techniques found in the literature, and subsequently fitted to a kernel- or a tree-based classifier. For this fusion, we proposed two new algorithms: recursive elimination of correlated components (RECC) that filters information based on the correlation between the already selected features, and information decomposition and selective fusion (IDSF), which effectively recombines information from decomposed multimodal features. Our cross-validation findings showed that the proposed IDSF algorithm delivers superior performance based on the area under the curve (AUC) score. Ultimately, after rigorous statistical comparisons and interpretable machine learning examination using Shapley values of the most frequently selected features, we recommend the two following magnetic resonance imaging (MRI) abnormalities as potential biomarkers: the left anterior limb of internal capsule in diffusion MRI (dMRI), and the right middle temporal gyrus in functional MRI (fMRI).</p></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":null,"pages":null},"PeriodicalIF":5.7,"publicationDate":"2024-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140775799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-18DOI: 10.1016/j.compmedimag.2024.102385
Ye-Jun Gong , Yue-Ke Li , Rongrong Zhou , Zhan Liang , Yingying Zhang , Tingting Cheng , Zi-Jian Zhang
Due to the high expenses involved, 4D-CT data for certain patients may only include five respiratory phases (0%, 20%, 40%, 60%, and 80%). This limitation can affect the subsequent planning of radiotherapy due to the absence of lung tumor information for the remaining five respiratory phases (10%, 30%, 50%, 70%, and 90%). This study aims to develop an interpolation method that can automatically derive tumor boundary contours for the five omitted phases using the available 5-phase 4D-CT data. The dynamic mode decomposition (DMD) method is a data-driven and model-free technique that can extract dynamic information from high-dimensional data. It enables the reconstruction of long-term dynamic patterns using only a limited number of time snapshots. The quasi-periodic motion of a deformable lung tumor caused by respiratory motion makes it suitable for treatment using DMD. The direct application of the DMD method to analyze the respiratory motion of the tumor is impractical because the tumor is three-dimensional and spans multiple CT slices. To predict the respiratory movement of lung tumors, a method called uniform angular interval (UAI) sampling was developed to generate snapshot vectors of equal length, which are suitable for DMD analysis. The effectiveness of this approach was confirmed by applying the UAI-DMD method to the 4D-CT data of ten patients with lung cancer. The results indicate that the UAI-DMD method effectively approximates the lung tumor’s deformable boundary surface and nonlinear motion trajectories. The estimated tumor centroid is within 2 mm of the manually delineated centroid, a smaller margin of error compared to the traditional BSpline interpolation method, which has a margin of 3 mm. This methodology has the potential to be extended to reconstruct the 20-phase respiratory movement of a lung tumor based on dynamic features from 10-phase 4D-CT data, thereby enabling more accurate estimation of the planned target volume (PTV).
{"title":"A novel approach for estimating lung tumor motion based on dynamic features in 4D-CT","authors":"Ye-Jun Gong , Yue-Ke Li , Rongrong Zhou , Zhan Liang , Yingying Zhang , Tingting Cheng , Zi-Jian Zhang","doi":"10.1016/j.compmedimag.2024.102385","DOIUrl":"https://doi.org/10.1016/j.compmedimag.2024.102385","url":null,"abstract":"<div><p>Due to the high expenses involved, 4D-CT data for certain patients may only include five respiratory phases (0%, 20%, 40%, 60%, and 80%). This limitation can affect the subsequent planning of radiotherapy due to the absence of lung tumor information for the remaining five respiratory phases (10%, 30%, 50%, 70%, and 90%). This study aims to develop an interpolation method that can automatically derive tumor boundary contours for the five omitted phases using the available 5-phase 4D-CT data. The dynamic mode decomposition (DMD) method is a data-driven and model-free technique that can extract dynamic information from high-dimensional data. It enables the reconstruction of long-term dynamic patterns using only a limited number of time snapshots. The quasi-periodic motion of a deformable lung tumor caused by respiratory motion makes it suitable for treatment using DMD. The direct application of the DMD method to analyze the respiratory motion of the tumor is impractical because the tumor is three-dimensional and spans multiple CT slices. To predict the respiratory movement of lung tumors, a method called uniform angular interval (UAI) sampling was developed to generate snapshot vectors of equal length, which are suitable for DMD analysis. The effectiveness of this approach was confirmed by applying the UAI-DMD method to the 4D-CT data of ten patients with lung cancer. The results indicate that the UAI-DMD method effectively approximates the lung tumor’s deformable boundary surface and nonlinear motion trajectories. The estimated tumor centroid is within 2 mm of the manually delineated centroid, a smaller margin of error compared to the traditional BSpline interpolation method, which has a margin of 3 mm. This methodology has the potential to be extended to reconstruct the 20-phase respiratory movement of a lung tumor based on dynamic features from 10-phase 4D-CT data, thereby enabling more accurate estimation of the planned target volume (PTV).</p></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":null,"pages":null},"PeriodicalIF":5.7,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140638590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-17DOI: 10.1016/j.compmedimag.2024.102383
Jiayi Zhu , Bart Bolsterlee , Brian V.Y. Chow , Yang Song , Erik Meijering
Semi-supervised learning has made significant progress in medical image segmentation. However, existing methods primarily utilize information from a single dimensionality, resulting in sub-optimal performance on challenging magnetic resonance imaging (MRI) data with multiple segmentation objects and anisotropic resolution. To address this issue, we present a Hybrid Dual Mean-Teacher (HD-Teacher) model with hybrid, semi-supervised, and multi-task learning to achieve effective semi-supervised segmentation. HD-Teacher employs a 2D and a 3D mean-teacher network to produce segmentation labels and signed distance fields from the hybrid information captured in both dimensionalities. This hybrid mechanism allows HD-Teacher to utilize features from 2D, 3D, or both dimensions as needed. Outputs from 2D and 3D teacher models are dynamically combined based on confidence scores, forming a single hybrid prediction with estimated uncertainty. We propose a hybrid regularization module to encourage both student models to produce results close to the uncertainty-weighted hybrid prediction to further improve their feature extraction capability. Extensive experiments of binary and multi-class segmentation conducted on three MRI datasets demonstrated that the proposed framework could (1) significantly outperform state-of-the-art semi-supervised methods (2) surpass a fully-supervised VNet trained on substantially more annotated data, and (3) perform on par with human raters on muscle and bone segmentation task. Code will be available at https://github.com/ThisGame42/Hybrid-Teacher.
{"title":"Hybrid dual mean-teacher network with double-uncertainty guidance for semi-supervised segmentation of magnetic resonance images","authors":"Jiayi Zhu , Bart Bolsterlee , Brian V.Y. Chow , Yang Song , Erik Meijering","doi":"10.1016/j.compmedimag.2024.102383","DOIUrl":"https://doi.org/10.1016/j.compmedimag.2024.102383","url":null,"abstract":"<div><p>Semi-supervised learning has made significant progress in medical image segmentation. However, existing methods primarily utilize information from a single dimensionality, resulting in sub-optimal performance on challenging magnetic resonance imaging (MRI) data with multiple segmentation objects and anisotropic resolution. To address this issue, we present a Hybrid Dual Mean-Teacher (HD-Teacher) model with hybrid, semi-supervised, and multi-task learning to achieve effective semi-supervised segmentation. HD-Teacher employs a 2D and a 3D mean-teacher network to produce segmentation labels and signed distance fields from the hybrid information captured in both dimensionalities. This hybrid mechanism allows HD-Teacher to utilize features from 2D, 3D, or both dimensions as needed. Outputs from 2D and 3D teacher models are dynamically combined based on confidence scores, forming a single hybrid prediction with estimated uncertainty. We propose a hybrid regularization module to encourage both student models to produce results close to the uncertainty-weighted hybrid prediction to further improve their feature extraction capability. Extensive experiments of binary and multi-class segmentation conducted on three MRI datasets demonstrated that the proposed framework could (1) significantly outperform state-of-the-art semi-supervised methods (2) surpass a fully-supervised VNet trained on substantially more annotated data, and (3) perform on par with human raters on muscle and bone segmentation task. Code will be available at <span>https://github.com/ThisGame42/Hybrid-Teacher</span><svg><path></path></svg>.</p></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":null,"pages":null},"PeriodicalIF":5.7,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0895611124000600/pdfft?md5=7ce6bdbb1f79301198bf452b8d9fd71f&pid=1-s2.0-S0895611124000600-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140631203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-16DOI: 10.1016/j.compmedimag.2024.102382
Hamed Aghapanah , Reza Rasti , Saeed Kermani , Faezeh Tabesh , Hossein Yousefi Banaem , Hamidreza Pour Aliakbar , Hamid Sanei , William Paul Segars
Cardiovascular MRI (CMRI) is a non-invasive imaging technique adopted for assessing the blood circulatory system’s structure and function. Precise image segmentation is required to measure cardiac parameters and diagnose abnormalities through CMRI data. Because of anatomical heterogeneity and image variations, cardiac image segmentation is a challenging task. Quantification of cardiac parameters requires high-performance segmentation of the left ventricle (LV), right ventricle (RV), and left ventricle myocardium from the background. The first proposed solution here is to manually segment the regions, which is a time-consuming and error-prone procedure. In this context, many semi- or fully automatic solutions have been proposed recently, among which deep learning-based methods have revealed high performance in segmenting regions in CMRI data. In this study, a self-adaptive multi attention (SMA) module is introduced to adaptively leverage multiple attention mechanisms for better segmentation. The convolutional-based position and channel attention mechanisms with a patch tokenization-based vision transformer (ViT)-based attention mechanism in a hybrid and end-to-end manner are integrated into the SMA. The CNN- and ViT-based attentions mine the short- and long-range dependencies for more precise segmentation. The SMA module is applied in an encoder-decoder structure with a ResNet50 backbone named CardSegNet. Furthermore, a deep supervision method with multi-loss functions is introduced to the CardSegNet optimizer to reduce overfitting and enhance the model’s performance. The proposed model is validated on the ACDC2017 (n=100), M&Ms (n=321), and a local dataset (n=22) using the 10-fold cross-validation method with promising segmentation results, demonstrating its outperformance versus its counterparts.
心血管磁共振成像(CMRI)是一种无创成像技术,用于评估血液循环系统的结构和功能。通过 CMRI 数据测量心脏参数和诊断异常需要精确的图像分割。由于解剖异质性和图像变化,心脏图像分割是一项具有挑战性的任务。心脏参数的量化需要从背景中高性能地分割出左心室(LV)、右心室(RV)和左心室心肌。这里提出的第一个解决方案是手动分割区域,这是一个耗时且容易出错的过程。在这种情况下,最近提出了许多半自动或全自动的解决方案,其中基于深度学习的方法在 CMRI 数据的区域分割方面表现出色。在本研究中,引入了自适应多重注意(SMA)模块,以自适应地利用多重注意机制来获得更好的分割效果。基于卷积的位置和通道注意力机制与基于补丁标记化的视觉转换器(ViT)注意力机制以混合和端到端的方式集成到了 SMA 中。基于 CNN 和 ViT 的注意力挖掘短程和长程依赖关系,以实现更精确的分割。SMA 模块被应用于以 ResNet50 为骨干的编码器-解码器结构中,并命名为 CardSegNet。 此外,CardSegNet 优化器还引入了具有多损失函数的深度监督方法,以减少过拟合并提高模型性能。利用 10 倍交叉验证法,在 ACDC2017(n=100)、M&Ms(n=321)和本地数据集(n=22)上对所提出的模型进行了验证,结果显示其分割效果优于同类模型。
{"title":"CardSegNet: An adaptive hybrid CNN-vision transformer model for heart region segmentation in cardiac MRI","authors":"Hamed Aghapanah , Reza Rasti , Saeed Kermani , Faezeh Tabesh , Hossein Yousefi Banaem , Hamidreza Pour Aliakbar , Hamid Sanei , William Paul Segars","doi":"10.1016/j.compmedimag.2024.102382","DOIUrl":"https://doi.org/10.1016/j.compmedimag.2024.102382","url":null,"abstract":"<div><p>Cardiovascular MRI (CMRI) is a non-invasive imaging technique adopted for assessing the blood circulatory system’s structure and function. Precise image segmentation is required to measure cardiac parameters and diagnose abnormalities through CMRI data. Because of anatomical heterogeneity and image variations, cardiac image segmentation is a challenging task. Quantification of cardiac parameters requires high-performance segmentation of the left ventricle (LV), right ventricle (RV), and left ventricle myocardium from the background. The first proposed solution here is to manually segment the regions, which is a time-consuming and error-prone procedure. In this context, many semi- or fully automatic solutions have been proposed recently, among which deep learning-based methods have revealed high performance in segmenting regions in CMRI data. In this study, a self-adaptive multi attention (SMA) module is introduced to adaptively leverage multiple attention mechanisms for better segmentation. The convolutional-based position and channel attention mechanisms with a patch tokenization-based vision transformer (ViT)-based attention mechanism in a hybrid and end-to-end manner are integrated into the SMA. The CNN- and ViT-based attentions mine the short- and long-range dependencies for more precise segmentation. The SMA module is applied in an encoder-decoder structure with a ResNet50 backbone named CardSegNet. Furthermore, a deep supervision method with multi-loss functions is introduced to the CardSegNet optimizer to reduce overfitting and enhance the model’s performance. The proposed model is validated on the ACDC2017 (n=100), M&Ms (n=321), and a local dataset (n=22) using the 10-fold cross-validation method with promising segmentation results, demonstrating its outperformance versus its counterparts.</p></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":null,"pages":null},"PeriodicalIF":5.7,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140618093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-16DOI: 10.1016/j.compmedimag.2024.102378
Luyu Tang , Songhui Diao , Chao Li , Miaoxia He , Kun Ru , Wenjian Qin
Current methods of digital pathological images typically employ small image patches to learn local representative features to overcome the issues of computationally heavy and memory limitations. However, the global contextual features are not fully considered in whole-slide images (WSIs). Here, we designed a hybrid model that utilizes Graph Neural Network (GNN) module and Transformer module for the representation of global contextual features, called TransGNN. GNN module built a WSI-Graph for the foreground area of a WSI for explicitly capturing structural features, and the Transformer module through the self-attention mechanism implicitly learned the global context information. The prognostic markers of hepatocellular carcinoma (HCC) prognostic biomarkers were used to illustrate the importance of global contextual information in cancer histopathological analysis. Our model was validated using 362 WSIs from 355 HCC patients diagnosed from The Cancer Genome Atlas (TCGA). It showed impressive performance with a Concordance Index (C-Index) of 0.7308 (95% Confidence Interval (CI): (0.6283–0.8333)) for overall survival prediction and achieved the best performance among all models. Additionally, our model achieved an area under curve of 0.7904, 0.8087, and 0.8004 for 1-year, 3-year, and 5-year survival predictions, respectively. We further verified the superior performance of our model in HCC risk stratification and its clinical value through Kaplan–Meier curve and univariate and multivariate COX regression analysis. Our research demonstrated that TransGNN effectively utilized the context information of WSIs and contributed to the clinical prognostic evaluation of HCC.
{"title":"Global contextual representation via graph-transformer fusion for hepatocellular carcinoma prognosis in whole-slide images","authors":"Luyu Tang , Songhui Diao , Chao Li , Miaoxia He , Kun Ru , Wenjian Qin","doi":"10.1016/j.compmedimag.2024.102378","DOIUrl":"https://doi.org/10.1016/j.compmedimag.2024.102378","url":null,"abstract":"<div><p>Current methods of digital pathological images typically employ small image patches to learn local representative features to overcome the issues of computationally heavy and memory limitations. However, the global contextual features are not fully considered in whole-slide images (WSIs). Here, we designed a hybrid model that utilizes Graph Neural Network (GNN) module and Transformer module for the representation of global contextual features, called TransGNN. GNN module built a WSI-Graph for the foreground area of a WSI for explicitly capturing structural features, and the Transformer module through the self-attention mechanism implicitly learned the global context information. The prognostic markers of hepatocellular carcinoma (HCC) prognostic biomarkers were used to illustrate the importance of global contextual information in cancer histopathological analysis. Our model was validated using 362 WSIs from 355 HCC patients diagnosed from The Cancer Genome Atlas (TCGA). It showed impressive performance with a Concordance Index (C-Index) of 0.7308 (95% Confidence Interval (CI): (0.6283–0.8333)) for overall survival prediction and achieved the best performance among all models. Additionally, our model achieved an area under curve of 0.7904, 0.8087, and 0.8004 for 1-year, 3-year, and 5-year survival predictions, respectively. We further verified the superior performance of our model in HCC risk stratification and its clinical value through Kaplan–Meier curve and univariate and multivariate COX regression analysis. Our research demonstrated that TransGNN effectively utilized the context information of WSIs and contributed to the clinical prognostic evaluation of HCC.</p></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":null,"pages":null},"PeriodicalIF":5.7,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140604545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-12DOI: 10.1016/j.compmedimag.2024.102381
Wenhao Zhong , Heye Zhang , Zhifan Gao , William Kongto Hau , Guang Yang , Xiujian Liu , Lin Xu
Vascular structure segmentation in intravascular ultrasound (IVUS) images plays an important role in pre-procedural evaluation of percutaneous coronary intervention (PCI). However, vascular structure segmentation in IVUS images has the challenge of structure-dependent distractions. Structure-dependent distractions are categorized into two cases, structural intrinsic distractions and inter-structural distractions. Traditional machine learning methods often rely solely on low-level features, overlooking high-level features. This way limits the generalization of these methods. The existing semantic segmentation methods integrate low-level and high-level features to enhance generalization performance. But these methods also introduce additional interference, which is harmful to solving structural intrinsic distractions. Distraction cue methods attempt to address structural intrinsic distractions by removing interference from the features through a unique decoder. However, they tend to overlook the problem of inter-structural distractions. In this paper, we propose distraction-aware hierarchical learning (DHL) for vascular structure segmentation in IVUS images. Inspired by distraction cue methods for removing interference in a decoder, the DHL is designed as a hierarchical decoder that gradually removes structure-dependent distractions. The DHL includes global perception process, distraction perception process and structural perception process. The global perception process and distraction perception process remove structural intrinsic distractions then the structural perception process removes inter-structural distractions. In the global perception process, the DHL searches for the coarse structural region of the vascular structures on the slice of IVUS sequence. In the distraction perception process, the DHL progressively refines the coarse structural region of the vascular structures to remove structural distractions. In the structural perception process, the DHL detects regions of inter-structural distractions in fused structure features then separates them. Extensive experiments on 361 subjects show that the DHL is effective (e.g., the average Dice is greater than 0.95), and superior to ten state-of-the-art IVUS vascular structure segmentation methods.
{"title":"Distraction-aware hierarchical learning for vascular structure segmentation in intravascular ultrasound images","authors":"Wenhao Zhong , Heye Zhang , Zhifan Gao , William Kongto Hau , Guang Yang , Xiujian Liu , Lin Xu","doi":"10.1016/j.compmedimag.2024.102381","DOIUrl":"https://doi.org/10.1016/j.compmedimag.2024.102381","url":null,"abstract":"<div><p>Vascular structure segmentation in intravascular ultrasound (IVUS) images plays an important role in pre-procedural evaluation of percutaneous coronary intervention (PCI). However, vascular structure segmentation in IVUS images has the challenge of structure-dependent distractions. Structure-dependent distractions are categorized into two cases, structural intrinsic distractions and inter-structural distractions. Traditional machine learning methods often rely solely on low-level features, overlooking high-level features. This way limits the generalization of these methods. The existing semantic segmentation methods integrate low-level and high-level features to enhance generalization performance. But these methods also introduce additional interference, which is harmful to solving structural intrinsic distractions. Distraction cue methods attempt to address structural intrinsic distractions by removing interference from the features through a unique decoder. However, they tend to overlook the problem of inter-structural distractions. In this paper, we propose distraction-aware hierarchical learning (DHL) for vascular structure segmentation in IVUS images. Inspired by distraction cue methods for removing interference in a decoder, the DHL is designed as a hierarchical decoder that gradually removes structure-dependent distractions. The DHL includes global perception process, distraction perception process and structural perception process. The global perception process and distraction perception process remove structural intrinsic distractions then the structural perception process removes inter-structural distractions. In the global perception process, the DHL searches for the coarse structural region of the vascular structures on the slice of IVUS sequence. In the distraction perception process, the DHL progressively refines the coarse structural region of the vascular structures to remove structural distractions. In the structural perception process, the DHL detects regions of inter-structural distractions in fused structure features then separates them. Extensive experiments on 361 subjects show that the DHL is effective (e.g., the average Dice is greater than 0.95), and superior to ten state-of-the-art IVUS vascular structure segmentation methods.</p></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":null,"pages":null},"PeriodicalIF":5.7,"publicationDate":"2024-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140618092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-12DOI: 10.1016/j.compmedimag.2024.102380
Xiaoguang Li , Yichao Zhou , Hongxia Yin , Pengfei Zhao , Ruowei Tang , Han Lv , Yating Qin , Li Zhuo , Zhenchang Wang
The absence of bone wall located in the jugular bulb and sigmoid sinus of the temporal bone is one of the important reasons for pulsatile tinnitus. Automatic and accurate detection of these abnormal singes in CT slices has important theoretical significance and clinical value. Due to the shortage of abnormal samples, imbalanced samples, small inter-class differences, and low interpretability, existing deep-learning methods are greatly challenged. In this paper, we proposed a sub-features orthogonal decoupling model, which can effectively disentangle the representation features into class-specific sub-features and class-independent sub-features in a latent space. The former contains the discriminative information, while, the latter preserves information for image reconstruction. In addition, the proposed method can generate image samples using category conversion by combining the different class-specific sub-features and the class-independent sub-features, achieving corresponding mapping between deep features and images of specific classes. The proposed model improves the interpretability of the deep model and provides image synthesis methods for downstream tasks. The effectiveness of the method was verified in the detection of bone wall absence in the temporal bone jugular bulb and sigmoid sinus.
{"title":"Sub-features orthogonal decoupling: Detecting bone wall absence via a small number of abnormal examples for temporal CT images","authors":"Xiaoguang Li , Yichao Zhou , Hongxia Yin , Pengfei Zhao , Ruowei Tang , Han Lv , Yating Qin , Li Zhuo , Zhenchang Wang","doi":"10.1016/j.compmedimag.2024.102380","DOIUrl":"https://doi.org/10.1016/j.compmedimag.2024.102380","url":null,"abstract":"<div><p>The absence of bone wall located in the jugular bulb and sigmoid sinus of the temporal bone is one of the important reasons for pulsatile tinnitus. Automatic and accurate detection of these abnormal singes in CT slices has important theoretical significance and clinical value. Due to the shortage of abnormal samples, imbalanced samples, small inter-class differences, and low interpretability, existing deep-learning methods are greatly challenged. In this paper, we proposed a sub-features orthogonal decoupling model, which can effectively disentangle the representation features into class-specific sub-features and class-independent sub-features in a latent space. The former contains the discriminative information, while, the latter preserves information for image reconstruction. In addition, the proposed method can generate image samples using category conversion by combining the different class-specific sub-features and the class-independent sub-features, achieving corresponding mapping between deep features and images of specific classes. The proposed model improves the interpretability of the deep model and provides image synthesis methods for downstream tasks. The effectiveness of the method was verified in the detection of bone wall absence in the temporal bone jugular bulb and sigmoid sinus.</p></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":null,"pages":null},"PeriodicalIF":5.7,"publicationDate":"2024-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140552526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deep learning (DL) has demonstrated its innate capacity to independently learn hierarchical features from complex and multi-dimensional data. A common understanding is that its performance scales up with the amount of training data. However, the data must also exhibit variety to enable improved learning. In medical imaging data, semantic redundancy, which is the presence of similar or repetitive information, can occur due to the presence of multiple images that have highly similar presentations for the disease of interest. Also, the common use of augmentation methods to generate variety in DL training could limit performance when indiscriminately applied to such data. We hypothesize that semantic redundancy would therefore tend to lower performance and limit generalizability to unseen data and question its impact on classifier performance even with large data. We propose an entropy-based sample scoring approach to identify and remove semantically redundant training data and demonstrate using the publicly available NIH chest X-ray dataset that the model trained on the resulting informative subset of training data significantly outperforms the model trained on the full training set, during both internal (recall: 0.7164 vs 0.6597, p<0.05) and external testing (recall: 0.3185 vs 0.2589, p<0.05). Our findings emphasize the importance of information-oriented training sample selection as opposed to the conventional practice of using all available training data.
深度学习(DL)已经证明,它具有从复杂的多维数据中独立学习分层特征的天生能力。一个共识是,其性能会随着训练数据量的增加而提高。然而,数据也必须表现出多样性,才能提高学习效率。在医学影像数据中,由于存在多张与相关疾病表现高度相似的图像,可能会出现语义冗余,即存在相似或重复的信息。此外,在 DL 训练中常用增强方法来产生多样性,如果不加区分地应用于此类数据,可能会限制其性能。因此,我们假设语义冗余往往会降低性能,限制对未见数据的泛化,并质疑其对分类器性能的影响,即使是大数据也是如此。我们提出了一种基于熵的样本评分方法来识别和移除语义冗余的训练数据,并使用公开的 NIH 胸部 X 光数据集证明,在内部测试(召回率:0.7164 vs 0.6597,p<0.05)和外部测试(召回率:0.3185 vs 0.2589,p<0.05)中,在训练数据的信息子集上训练的模型明显优于在完整训练集上训练的模型。我们的发现强调了以信息为导向的训练样本选择的重要性,而不是使用所有可用训练数据的传统做法。
{"title":"Semantically redundant training data removal and deep model classification performance: A study with chest X-rays","authors":"Sivaramakrishnan Rajaraman, Ghada Zamzmi , Feng Yang , Zhaohui Liang, Zhiyun Xue, Sameer Antani","doi":"10.1016/j.compmedimag.2024.102379","DOIUrl":"https://doi.org/10.1016/j.compmedimag.2024.102379","url":null,"abstract":"<div><p>Deep learning (DL) has demonstrated its innate capacity to independently learn hierarchical features from complex and multi-dimensional data. A common understanding is that its performance scales up with the amount of training data. However, the data must also exhibit variety to enable improved learning. In medical imaging data, semantic redundancy, which is the presence of similar or repetitive information, can occur due to the presence of multiple images that have highly similar presentations for the disease of interest. Also, the common use of augmentation methods to generate variety in DL training could limit performance when indiscriminately applied to such data. We hypothesize that semantic redundancy would therefore tend to lower performance and limit generalizability to unseen data and question its impact on classifier performance even with large data. We propose an entropy-based sample scoring approach to identify and remove semantically redundant training data and demonstrate using the publicly available NIH chest X-ray dataset that the model trained on the resulting informative subset of training data significantly outperforms the model trained on the full training set, during both internal (recall: 0.7164 vs 0.6597, p<0.05) and external testing (recall: 0.3185 vs 0.2589, p<0.05). Our findings emphasize the importance of information-oriented training sample selection as opposed to the conventional practice of using all available training data.</p></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":null,"pages":null},"PeriodicalIF":5.7,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0895611124000569/pdfft?md5=6892a4c80999a323e6edf07480aef597&pid=1-s2.0-S0895611124000569-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140545845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-29DOI: 10.1016/j.compmedimag.2024.102375
Chia-Feng Juang , Ya-Wen Chuang , Guan-Wen Lin , I-Fang Chung , Ying-Chih Lo
Glomerulus morphology on renal pathology images provides valuable diagnosis and outcome prediction information. To provide better care, an efficient, standardized, and scalable method is urgently needed to optimize the time-consuming and labor-intensive interpretation process by renal pathologists. This paper proposes a deep convolutional neural network (CNN)-based approach to automatically detect and classify glomeruli with different stains in renal pathology images. In the glomerulus detection stage, this paper proposes a flattened Xception with a feature pyramid network (FX-FPN). The FX-FPN is employed as a backbone in the framework of faster region-based CNN to improve glomerulus detection performance. In the classification stage, this paper considers classifications of five glomerulus morphologies using a flattened Xception classifier. To endow the classifier with higher discriminability, this paper proposes a generative data augmentation approach for patch-based glomerulus morphology augmentation. New glomerulus patches of different morphologies are generated for data augmentation through the cycle-consistent generative adversarial network (CycleGAN). The single detection model shows the score up to 0.9524 in H&E and PAS stains. The classification result shows that the average sensitivity and specificity are 0.7077 and 0.9316, respectively, by using the flattened Xception with the original training data. The sensitivity and specificity increase to 0.7623 and 0.9443, respectively, by using the generative data augmentation. Comparisons with different deep CNN models show the effectiveness and superiority of the proposed approach.
{"title":"Deep learning-based glomerulus detection and classification with generative morphology augmentation in renal pathology images","authors":"Chia-Feng Juang , Ya-Wen Chuang , Guan-Wen Lin , I-Fang Chung , Ying-Chih Lo","doi":"10.1016/j.compmedimag.2024.102375","DOIUrl":"10.1016/j.compmedimag.2024.102375","url":null,"abstract":"<div><p>Glomerulus morphology on renal pathology images provides valuable diagnosis and outcome prediction information. To provide better care, an efficient, standardized, and scalable method is urgently needed to optimize the time-consuming and labor-intensive interpretation process by renal pathologists. This paper proposes a deep convolutional neural network (CNN)-based approach to automatically detect and classify glomeruli with different stains in renal pathology images. In the glomerulus detection stage, this paper proposes a flattened Xception with a feature pyramid network (FX-FPN). The FX-FPN is employed as a backbone in the framework of faster region-based CNN to improve glomerulus detection performance. In the classification stage, this paper considers classifications of five glomerulus morphologies using a flattened Xception classifier. To endow the classifier with higher discriminability, this paper proposes a generative data augmentation approach for patch-based glomerulus morphology augmentation. New glomerulus patches of different morphologies are generated for data augmentation through the cycle-consistent generative adversarial network (CycleGAN). The single detection model shows the <span><math><msub><mrow><mi>F</mi></mrow><mrow><mn>1</mn></mrow></msub></math></span> score up to 0.9524 in H&E and PAS stains. The classification result shows that the average sensitivity and specificity are 0.7077 and 0.9316, respectively, by using the flattened Xception with the original training data. The sensitivity and specificity increase to 0.7623 and 0.9443, respectively, by using the generative data augmentation. Comparisons with different deep CNN models show the effectiveness and superiority of the proposed approach.</p></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":null,"pages":null},"PeriodicalIF":5.7,"publicationDate":"2024-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140404349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}