Pub Date : 2024-07-01DOI: 10.1109/TMI.2024.3421360
Jie Zhou, Biao Jie, Zhengdong Wang, Zhixiang Zhang, Tongchun Du, Weixin Bian, Yang Yang, Jun Jia
Analysis of functional connectivity networks (FCNs) derived from resting-state functional magnetic resonance imaging (rs-fMRI) has greatly advanced our understanding of brain diseases, including Alzheimer's disease (AD) and attention deficit hyperactivity disorder (ADHD). Advanced machine learning techniques, such as convolutional neural networks (CNNs), have been used to learn high-level feature representations of FCNs for automated brain disease classification. Even though convolution operations in CNNs are good at extracting local properties of FCNs, they generally cannot well capture global temporal representations of FCNs. Recently, the transformer technique has demonstrated remarkable performance in various tasks, which is attributed to its effective self-attention mechanism in capturing the global temporal feature representations. However, it cannot effectively model the local network characteristics of FCNs. To this end, in this paper, we propose a novel network structure for Local sequential feature Coupling Global representation learning (LCGNet) to take advantage of convolutional operations and self-attention mechanisms for enhanced FCN representation learning. Specifically, we first build a dynamic FCN for each subject using an overlapped sliding window approach. We then construct three sequential components (i.e., edge-to-vertex layer, vertex-to-network layer, and network-to-temporality layer) with a dual backbone branch of CNN and transformer to extract and couple from local to global topological information of brain networks. Experimental results on two real datasets (i.e., ADNI and ADHD-200) with rs-fMRI data show the superiority of our LCGNet.
{"title":"LCGNet: Local Sequential Feature Coupling Global Representation Learning for Functional Connectivity Network Analysis with fMRI.","authors":"Jie Zhou, Biao Jie, Zhengdong Wang, Zhixiang Zhang, Tongchun Du, Weixin Bian, Yang Yang, Jun Jia","doi":"10.1109/TMI.2024.3421360","DOIUrl":"https://doi.org/10.1109/TMI.2024.3421360","url":null,"abstract":"<p><p>Analysis of functional connectivity networks (FCNs) derived from resting-state functional magnetic resonance imaging (rs-fMRI) has greatly advanced our understanding of brain diseases, including Alzheimer's disease (AD) and attention deficit hyperactivity disorder (ADHD). Advanced machine learning techniques, such as convolutional neural networks (CNNs), have been used to learn high-level feature representations of FCNs for automated brain disease classification. Even though convolution operations in CNNs are good at extracting local properties of FCNs, they generally cannot well capture global temporal representations of FCNs. Recently, the transformer technique has demonstrated remarkable performance in various tasks, which is attributed to its effective self-attention mechanism in capturing the global temporal feature representations. However, it cannot effectively model the local network characteristics of FCNs. To this end, in this paper, we propose a novel network structure for Local sequential feature Coupling Global representation learning (LCGNet) to take advantage of convolutional operations and self-attention mechanisms for enhanced FCN representation learning. Specifically, we first build a dynamic FCN for each subject using an overlapped sliding window approach. We then construct three sequential components (i.e., edge-to-vertex layer, vertex-to-network layer, and network-to-temporality layer) with a dual backbone branch of CNN and transformer to extract and couple from local to global topological information of brain networks. Experimental results on two real datasets (i.e., ADNI and ADHD-200) with rs-fMRI data show the superiority of our LCGNet.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141478184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The multi-source stationary CT, where both the detector and X-ray source are fixed, represents a novel imaging system with high temporal resolution that has garnered significant interest. Limited space within the system restricts the number of X-ray sources, leading to sparse-view CT imaging challenges. Recent diffusion models for reconstructing sparse-view CT have generally focused separately on sinogram or image domains. Sinogram-centric models effectively estimate missing projections but may introduce artifacts, lacking mechanisms to ensure image correctness. Conversely, image-domain models, while capturing detailed image features, often struggle with complex data distribution, leading to inaccuracies in projections. Addressing these issues, the Dual-domain Collaborative Diffusion Sampling (DCDS) model integrates sinogram and image domain diffusion processes for enhanced sparse-view reconstruction. This model combines the strengths of both domains in an optimized mathematical framework. A collaborative diffusion mechanism underpins this model, improving sinogram recovery and image generative capabilities. This mechanism facilitates feedback-driven image generation from the sinogram domain and uses image domain results to complete missing projections. Optimization of the DCDS model is further achieved through the alternative direction iteration method, focusing on data consistency updates. Extensive testing, including numerical simulations, real phantoms, and clinical cardiac datasets, demonstrates the DCDS model’s effectiveness. It consistently outperforms various state-of-the-art benchmarks, delivering exceptional reconstruction quality and precise sinogram.
多源固定 CT(探测器和 X 射线源都是固定的)是一种具有高时间分辨率的新型成像系统,已引起广泛关注。系统内有限的空间限制了 X 射线源的数量,从而给稀疏视图 CT 成像带来了挑战。最近用于重建稀疏视图 CT 的扩散模型一般都分别侧重于矢量图域或图像域。以正弦图为中心的模型能有效估计缺失的投影,但可能会引入伪影,缺乏确保图像正确性的机制。相反,图像域模型虽然能捕捉到详细的图像特征,但往往难以应对复杂的数据分布,导致投影不准确。为了解决这些问题,双域协作扩散采样(DCDS)模型整合了正弦图和图像域扩散过程,以增强稀疏视图重建。该模型在优化的数学框架中结合了两个域的优势。协作扩散机制是这一模型的基础,可提高正弦图恢复和图像生成能力。这种机制有利于从正弦图域生成反馈驱动的图像,并利用图像域的结果来完成缺失的投影。通过替代方向迭代法进一步实现了 DCDS 模型的优化,重点是数据一致性更新。广泛的测试(包括数值模拟、真实模型和临床心脏数据集)证明了 DCDS 模型的有效性。它的性能始终优于各种最先进的基准,可提供卓越的重建质量和精确的正弦曲线。
{"title":"Dual-Domain Collaborative Diffusion Sampling for Multi-Source Stationary Computed Tomography Reconstruction","authors":"Zirong Li;Dingyue Chang;Zhenxi Zhang;Fulin Luo;Qiegen Liu;Jianjia Zhang;Guang Yang;Weiwen Wu","doi":"10.1109/TMI.2024.3420411","DOIUrl":"10.1109/TMI.2024.3420411","url":null,"abstract":"The multi-source stationary CT, where both the detector and X-ray source are fixed, represents a novel imaging system with high temporal resolution that has garnered significant interest. Limited space within the system restricts the number of X-ray sources, leading to sparse-view CT imaging challenges. Recent diffusion models for reconstructing sparse-view CT have generally focused separately on sinogram or image domains. Sinogram-centric models effectively estimate missing projections but may introduce artifacts, lacking mechanisms to ensure image correctness. Conversely, image-domain models, while capturing detailed image features, often struggle with complex data distribution, leading to inaccuracies in projections. Addressing these issues, the Dual-domain Collaborative Diffusion Sampling (DCDS) model integrates sinogram and image domain diffusion processes for enhanced sparse-view reconstruction. This model combines the strengths of both domains in an optimized mathematical framework. A collaborative diffusion mechanism underpins this model, improving sinogram recovery and image generative capabilities. This mechanism facilitates feedback-driven image generation from the sinogram domain and uses image domain results to complete missing projections. Optimization of the DCDS model is further achieved through the alternative direction iteration method, focusing on data consistency updates. Extensive testing, including numerical simulations, real phantoms, and clinical cardiac datasets, demonstrates the DCDS model’s effectiveness. It consistently outperforms various state-of-the-art benchmarks, delivering exceptional reconstruction quality and precise sinogram.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"43 10","pages":"3398-3411"},"PeriodicalIF":0.0,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141462763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Magnetic Particle Imaging (MPI) is an emerging tomographic modality that allows for precise three-dimensional (3D) mapping of magnetic nanoparticles (MNPs) concentration and distribution. Although significant progress has been made towards improving MPI since its introduction, scaling it up for human applications has proven challenging. High-quality images have been obtained in animal-scale MPI scanners with gradients up to 7 T/m/μ0, however, for MPI systems with bore diameters around 200 mm the gradients generated by electromagnets drop significantly to below 0.5 T/m/μ0. Given the current technological limitations in image reconstruction and the properties of available MNPs, these low gradients inherently impose limitations on improving MPI resolution for higher precision medical imaging. Utilizing superconductors stands out as a promising approach for developing a human-scale MPI system. In this study, we introduce, for the first time, a human-scale amplitude-modulated (AM) MPI system with superconductor-based selection coils. The system achieves an unprecedented magnetic field gradient of up to 2.5 T/m/μ0 within a 200 mm bore diameter, enabling large fields of view of 100 × 130 × 98 mm3 at 2.5 T/m/μ0 for 3D imaging. While obtained spatial resolution is in the order of previous animal-scale AM MPIs, incorporating superconductors for achieving such high gradients in a 200 mm bore diameter marks a major step toward clinical MPI.
{"title":"Towards human-scale magnetic particle imaging: development of the first system with superconductor-based selection coils.","authors":"Tuan-Anh Le, Minh Phu Bui, Yaser Hadadian, Khaled Mohamed Gadelmowla, Seungjun Oh, Chaemin Im, Seungyong Hahn, Jungwon Yoon","doi":"10.1109/TMI.2024.3419427","DOIUrl":"https://doi.org/10.1109/TMI.2024.3419427","url":null,"abstract":"<p><p>Magnetic Particle Imaging (MPI) is an emerging tomographic modality that allows for precise three-dimensional (3D) mapping of magnetic nanoparticles (MNPs) concentration and distribution. Although significant progress has been made towards improving MPI since its introduction, scaling it up for human applications has proven challenging. High-quality images have been obtained in animal-scale MPI scanners with gradients up to 7 T/m/μ<sub>0</sub>, however, for MPI systems with bore diameters around 200 mm the gradients generated by electromagnets drop significantly to below 0.5 T/m/μ<sub>0</sub>. Given the current technological limitations in image reconstruction and the properties of available MNPs, these low gradients inherently impose limitations on improving MPI resolution for higher precision medical imaging. Utilizing superconductors stands out as a promising approach for developing a human-scale MPI system. In this study, we introduce, for the first time, a human-scale amplitude-modulated (AM) MPI system with superconductor-based selection coils. The system achieves an unprecedented magnetic field gradient of up to 2.5 T/m/μ<sub>0</sub> within a 200 mm bore diameter, enabling large fields of view of 100 × 130 × 98 mm<sup>3</sup> at 2.5 T/m/μ<sub>0</sub> for 3D imaging. While obtained spatial resolution is in the order of previous animal-scale AM MPIs, incorporating superconductors for achieving such high gradients in a 200 mm bore diameter marks a major step toward clinical MPI.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141461400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-26DOI: 10.1109/TMI.2024.3419697
De Cai, Jie Chen, Junhan Zhao, Yuan Xue, Sen Yang, Wei Yuan, Min Feng, Haiyan Weng, Shuguang Liu, Yulong Peng, Junyou Zhu, Kanran Wang, Christopher Jackson, Hongping Tang, Junzhou Huang, Xiyue Wang
Cervical cytology is a critical screening strategy for early detection of pre-cancerous and cancerous cervical lesions. The challenge lies in accurately classifying various cervical cytology cell types. Existing automated cervical cytology methods are primarily trained on databases covering a narrow range of coarse-grained cell types, which fail to provide a comprehensive and detailed performance analysis that accurately represents real-world cytopathology conditions. To overcome these limitations, we introduce HiCervix, the most extensive, multi-center cervical cytology dataset currently available to the public. HiCervix includes 40,229 cervical cells from 4,496 whole slide images, categorized into 29 annotated classes. These classes are organized within a three-level hierarchical tree to capture fine-grained subtype information. To exploit the semantic correlation inherent in this hierarchical tree, we propose HierSwin, a hierarchical vision transformer-based classification network. HierSwin serves as a benchmark for detailed feature learning in both coarse-level and fine-level cervical cancer classification tasks. In our comprehensive experiments, HierSwin demonstrated remarkable performance, achieving 92.08% accuracy for coarse-level classification and 82.93% accuracy averaged across all three levels. When compared to board-certified cytopathologists, HierSwin achieved high classification performance (0.8293 versus 0.7359 averaged accuracy), highlighting its potential for clinical applications. This newly released HiCervix dataset, along with our benchmark HierSwin method, is poised to make a substantial impact on the advancement of deep learning algorithms for rapid cervical cancer screening and greatly improve cancer prevention and patient outcomes in real-world clinical settings.
{"title":"HiCervix: An Extensive Hierarchical Dataset and Benchmark for Cervical Cytology Classification.","authors":"De Cai, Jie Chen, Junhan Zhao, Yuan Xue, Sen Yang, Wei Yuan, Min Feng, Haiyan Weng, Shuguang Liu, Yulong Peng, Junyou Zhu, Kanran Wang, Christopher Jackson, Hongping Tang, Junzhou Huang, Xiyue Wang","doi":"10.1109/TMI.2024.3419697","DOIUrl":"https://doi.org/10.1109/TMI.2024.3419697","url":null,"abstract":"<p><p>Cervical cytology is a critical screening strategy for early detection of pre-cancerous and cancerous cervical lesions. The challenge lies in accurately classifying various cervical cytology cell types. Existing automated cervical cytology methods are primarily trained on databases covering a narrow range of coarse-grained cell types, which fail to provide a comprehensive and detailed performance analysis that accurately represents real-world cytopathology conditions. To overcome these limitations, we introduce HiCervix, the most extensive, multi-center cervical cytology dataset currently available to the public. HiCervix includes 40,229 cervical cells from 4,496 whole slide images, categorized into 29 annotated classes. These classes are organized within a three-level hierarchical tree to capture fine-grained subtype information. To exploit the semantic correlation inherent in this hierarchical tree, we propose HierSwin, a hierarchical vision transformer-based classification network. HierSwin serves as a benchmark for detailed feature learning in both coarse-level and fine-level cervical cancer classification tasks. In our comprehensive experiments, HierSwin demonstrated remarkable performance, achieving 92.08% accuracy for coarse-level classification and 82.93% accuracy averaged across all three levels. When compared to board-certified cytopathologists, HierSwin achieved high classification performance (0.8293 versus 0.7359 averaged accuracy), highlighting its potential for clinical applications. This newly released HiCervix dataset, along with our benchmark HierSwin method, is poised to make a substantial impact on the advancement of deep learning algorithms for rapid cervical cancer screening and greatly improve cancer prevention and patient outcomes in real-world clinical settings.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141461398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-26DOI: 10.1109/TMI.2024.3419134
Wenhui Yang, Shuo Gao, Hao Zhang, Hong Yu, Menglei Xu, Puimun Chong, Weijie Zhang, Hong Wang, Wenjuan Zhang, Airong Qian
Pulmonary Tuberculosis (PTB) is one of the world's most infectious illnesses, and its early detection is critical for preventing PTB. Digital Radiography (DR) has been the most common and effective technique to examine PTB. However, due to the variety and weak specificity of phenotypes on DR chest X-ray (DCR), it is difficult to make reliable diagnoses for radiologists. Although artificial intelligence technology has made considerable gains in assisting the diagnosis of PTB, it lacks methods to identify the lesions of PTB with few-shot classes and small objects. To solve these problems, geometric data augmentation was used to increase the size of the DCRs. For this purpose, a diffusion probability model was implemented for six few-shot classes. Importantly, we propose a new multi-lesion detector PtbNet based on RetinaNet, which was constructed to detect small objects of PTB lesions. The results showed that by two data augmentations, the number of DCRs increased by 80% from 570 to 2,859. In the pre-evaluation experiments with the baseline, RetinaNet, the AP improved by 9.9 for six few-shot classes. Our extensive empirical evaluation showed that the AP of PtbNet achieved 28.2, outperforming the other 9 state-of-the-art methods. In the ablation study, combined with BiFPN+ and PSPD-Conv, the AP increased by 2.1, APs increased by 5.0, and grew by an average of 9.8 in APm and APl. In summary, PtbNet not only improves the detection of small-object lesions but also enhances the ability to detect different types of PTB uniformly, which helps physicians diagnose PTB lesions accurately. The code is available at https://github.com/Wenhui-person/PtbNet/tree/master.
{"title":"PtbNet: Based on Local Few-Shot Classes and Small Objects to accurately detect PTB.","authors":"Wenhui Yang, Shuo Gao, Hao Zhang, Hong Yu, Menglei Xu, Puimun Chong, Weijie Zhang, Hong Wang, Wenjuan Zhang, Airong Qian","doi":"10.1109/TMI.2024.3419134","DOIUrl":"https://doi.org/10.1109/TMI.2024.3419134","url":null,"abstract":"<p><p>Pulmonary Tuberculosis (PTB) is one of the world's most infectious illnesses, and its early detection is critical for preventing PTB. Digital Radiography (DR) has been the most common and effective technique to examine PTB. However, due to the variety and weak specificity of phenotypes on DR chest X-ray (DCR), it is difficult to make reliable diagnoses for radiologists. Although artificial intelligence technology has made considerable gains in assisting the diagnosis of PTB, it lacks methods to identify the lesions of PTB with few-shot classes and small objects. To solve these problems, geometric data augmentation was used to increase the size of the DCRs. For this purpose, a diffusion probability model was implemented for six few-shot classes. Importantly, we propose a new multi-lesion detector PtbNet based on RetinaNet, which was constructed to detect small objects of PTB lesions. The results showed that by two data augmentations, the number of DCRs increased by 80% from 570 to 2,859. In the pre-evaluation experiments with the baseline, RetinaNet, the AP improved by 9.9 for six few-shot classes. Our extensive empirical evaluation showed that the AP of PtbNet achieved 28.2, outperforming the other 9 state-of-the-art methods. In the ablation study, combined with BiFPN+ and PSPD-Conv, the AP increased by 2.1, AP<sup>s</sup> increased by 5.0, and grew by an average of 9.8 in AP<sup>m</sup> and AP<sup>l</sup>. In summary, PtbNet not only improves the detection of small-object lesions but also enhances the ability to detect different types of PTB uniformly, which helps physicians diagnose PTB lesions accurately. The code is available at https://github.com/Wenhui-person/PtbNet/tree/master.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141461399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-26DOI: 10.1109/TMI.2024.3419707
Puyang Wang, Dazhou Guo, Dandan Zheng, Minghui Zhang, Haogang Yu, Xin Sun, Jia Ge, Yun Gu, Le Lu, Xianghua Ye, Dakai Jin
Intrathoracic airway segmentation in computed tomography is a prerequisite for various respiratory disease analyses such as chronic obstructive pulmonary disease, asthma and lung cancer. Due to the low imaging contrast and noises execrated at peripheral branches, the topological-complexity and the intra-class imbalance of airway tree, it remains challenging for deep learning-based methods to segment the complete airway tree (on extracting deeper branches). Unlike other organs with simpler shapes or topology, the airway's complex tree structure imposes an unbearable burden to generate the "ground truth" label (up to 7 or 3 hours of manual or semi-automatic annotation per case). Most of the existing airway datasets are incompletely labeled/annotated, thus limiting the completeness of computer-segmented airway. In this paper, we propose a new anatomy-aware multi-class airway segmentation method enhanced by topology-guided iterative self-learning. Based on the natural airway anatomy, we formulate a simple yet highly effective anatomy-aware multi-class segmentation task to intuitively handle the severe intra-class imbalance of the airway. To solve the incomplete labeling issue, we propose a tailored iterative self-learning scheme to segment toward the complete airway tree. For generating pseudo-labels to achieve higher sensitivity (while retaining similar specificity), we introduce a novel breakage attention map and design a topology-guided pseudo-label refinement method by iteratively connecting breaking branches commonly existed from initial pseudo-labels. Extensive experiments have been conducted on four datasets including two public challenges. The proposed method achieves the top performance in both EXACT'09 challenge using average score and ATM'22 challenge on weighted average score. In a public BAS dataset and a private lung cancer dataset, our method significantly improves previous leading approaches by extracting at least (absolute) 6.1% more detected tree length and 5.2% more tree branches, while maintaining comparable precision.
{"title":"Accurate Airway Tree Segmentation in CT Scans via Anatomy-aware Multi-class Segmentation and Topology-guided Iterative Learning.","authors":"Puyang Wang, Dazhou Guo, Dandan Zheng, Minghui Zhang, Haogang Yu, Xin Sun, Jia Ge, Yun Gu, Le Lu, Xianghua Ye, Dakai Jin","doi":"10.1109/TMI.2024.3419707","DOIUrl":"https://doi.org/10.1109/TMI.2024.3419707","url":null,"abstract":"<p><p>Intrathoracic airway segmentation in computed tomography is a prerequisite for various respiratory disease analyses such as chronic obstructive pulmonary disease, asthma and lung cancer. Due to the low imaging contrast and noises execrated at peripheral branches, the topological-complexity and the intra-class imbalance of airway tree, it remains challenging for deep learning-based methods to segment the complete airway tree (on extracting deeper branches). Unlike other organs with simpler shapes or topology, the airway's complex tree structure imposes an unbearable burden to generate the \"ground truth\" label (up to 7 or 3 hours of manual or semi-automatic annotation per case). Most of the existing airway datasets are incompletely labeled/annotated, thus limiting the completeness of computer-segmented airway. In this paper, we propose a new anatomy-aware multi-class airway segmentation method enhanced by topology-guided iterative self-learning. Based on the natural airway anatomy, we formulate a simple yet highly effective anatomy-aware multi-class segmentation task to intuitively handle the severe intra-class imbalance of the airway. To solve the incomplete labeling issue, we propose a tailored iterative self-learning scheme to segment toward the complete airway tree. For generating pseudo-labels to achieve higher sensitivity (while retaining similar specificity), we introduce a novel breakage attention map and design a topology-guided pseudo-label refinement method by iteratively connecting breaking branches commonly existed from initial pseudo-labels. Extensive experiments have been conducted on four datasets including two public challenges. The proposed method achieves the top performance in both EXACT'09 challenge using average score and ATM'22 challenge on weighted average score. In a public BAS dataset and a private lung cancer dataset, our method significantly improves previous leading approaches by extracting at least (absolute) 6.1% more detected tree length and 5.2% more tree branches, while maintaining comparable precision.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141461397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Available evidence suggests that dynamic functional connectivity can capture time-varying abnormalities in brain activity in resting-state cerebral functional magnetic resonance imaging (rs-fMRI) data and has a natural advantage in uncovering mechanisms of abnormal brain activity in schizophrenia (SZ) patients. Hence, an advanced dynamic brain network analysis model called the temporal brain category graph convolutional network (Temporal-BCGCN) was employed. Firstly, a unique dynamic brain network analysis module, DSF-BrainNet, was designed to construct dynamic synchronization features. Subsequently, a revolutionary graph convolution method, TemporalConv, was proposed based on the synchronous temporal properties of features. Finally, the first modular test tool for abnormal hemispherical lateralization in deep learning based on rs-fMRI data, named CategoryPool, was proposed. This study was validated on COBRE and UCLA datasets and achieved 83.62% and 89.71% average accuracies, respectively, outperforming the baseline model and other state-of-theart methods. The ablation results also demonstrate the advantages of TemporalConv over the traditional edge feature graph convolution approach and the improvement of CategoryPool over the classical graph pooling approach. Interestingly, this study showed that the lower-order perceptual system and higher-order network regions in the left hemisphere are more severely dysfunctional than in the right hemisphere in SZ, reaffirmings the importance of the left medial superior frontal gyrus in SZ. Our code was available at: https://github.com/swfen/Temporal-BCGCN.
{"title":"Temporal Dynamic Synchronous Functional Brain Network for Schizophrenia Classification and Lateralization Analysis.","authors":"Cheng Zhu, Ying Tan, Shuqi Yang, Jiaqing Miao, Jiayi Zhu, Huan Huang, Dezhong Yao, Cheng Luo","doi":"10.1109/TMI.2024.3419041","DOIUrl":"10.1109/TMI.2024.3419041","url":null,"abstract":"<p><p>Available evidence suggests that dynamic functional connectivity can capture time-varying abnormalities in brain activity in resting-state cerebral functional magnetic resonance imaging (rs-fMRI) data and has a natural advantage in uncovering mechanisms of abnormal brain activity in schizophrenia (SZ) patients. Hence, an advanced dynamic brain network analysis model called the temporal brain category graph convolutional network (Temporal-BCGCN) was employed. Firstly, a unique dynamic brain network analysis module, DSF-BrainNet, was designed to construct dynamic synchronization features. Subsequently, a revolutionary graph convolution method, TemporalConv, was proposed based on the synchronous temporal properties of features. Finally, the first modular test tool for abnormal hemispherical lateralization in deep learning based on rs-fMRI data, named CategoryPool, was proposed. This study was validated on COBRE and UCLA datasets and achieved 83.62% and 89.71% average accuracies, respectively, outperforming the baseline model and other state-of-theart methods. The ablation results also demonstrate the advantages of TemporalConv over the traditional edge feature graph convolution approach and the improvement of CategoryPool over the classical graph pooling approach. Interestingly, this study showed that the lower-order perceptual system and higher-order network regions in the left hemisphere are more severely dysfunctional than in the right hemisphere in SZ, reaffirmings the importance of the left medial superior frontal gyrus in SZ. Our code was available at: https://github.com/swfen/Temporal-BCGCN.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141452465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-24DOI: 10.1109/TMI.2024.3418652
Yongyi Shi;Wenjun Xia;Ge Wang;Xuanqin Mou
Lowering radiation dose per view and utilizing sparse views per scan are two common CT scan modes, albeit often leading to distorted images characterized by noise and streak artifacts. Blind image quality assessment (BIQA) strives to evaluate perceptual quality in alignment with what radiologists perceive, which plays an important role in advancing low-dose CT reconstruction techniques. An intriguing direction involves developing BIQA methods that mimic the operational characteristic of the human visual system (HVS). The internal generative mechanism (IGM) theory reveals that the HVS actively deduces primary content to enhance comprehension. In this study, we introduce an innovative BIQA metric that emulates the active inference process of IGM. Initially, an active inference module, implemented as a denoising diffusion probabilistic model (DDPM), is constructed to anticipate the primary content. Then, the dissimilarity map is derived by assessing the interrelation between the distorted image and its primary content. Subsequently, the distorted image and dissimilarity map are combined into a multi-channel image, which is inputted into a transformer-based image quality evaluator. By leveraging the DDPM-derived primary content, our approach achieves competitive performance on a low-dose CT dataset.
{"title":"Blind CT Image Quality Assessment Using DDPM-Derived Content and Transformer-Based Evaluator","authors":"Yongyi Shi;Wenjun Xia;Ge Wang;Xuanqin Mou","doi":"10.1109/TMI.2024.3418652","DOIUrl":"10.1109/TMI.2024.3418652","url":null,"abstract":"Lowering radiation dose per view and utilizing sparse views per scan are two common CT scan modes, albeit often leading to distorted images characterized by noise and streak artifacts. Blind image quality assessment (BIQA) strives to evaluate perceptual quality in alignment with what radiologists perceive, which plays an important role in advancing low-dose CT reconstruction techniques. An intriguing direction involves developing BIQA methods that mimic the operational characteristic of the human visual system (HVS). The internal generative mechanism (IGM) theory reveals that the HVS actively deduces primary content to enhance comprehension. In this study, we introduce an innovative BIQA metric that emulates the active inference process of IGM. Initially, an active inference module, implemented as a denoising diffusion probabilistic model (DDPM), is constructed to anticipate the primary content. Then, the dissimilarity map is derived by assessing the interrelation between the distorted image and its primary content. Subsequently, the distorted image and dissimilarity map are combined into a multi-channel image, which is inputted into a transformer-based image quality evaluator. By leveraging the DDPM-derived primary content, our approach achieves competitive performance on a low-dose CT dataset.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"43 10","pages":"3559-3569"},"PeriodicalIF":0.0,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141447882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-24DOI: 10.1109/TMI.2024.3418408
Pengyu Wang, Huaqi Zhang, Yixuan Yuan
Multi-modal prompt learning is a high-performance and cost-effective learning paradigm, which learns text as well as image prompts to tune pre-trained vision-language (V-L) models like CLIP for adapting multiple downstream tasks. However, recent methods typically treat text and image prompts as independent components without considering the dependency between prompts. Moreover, extending multi-modal prompt learning into the medical field poses challenges due to a significant gap between general- and medical-domain data. To this end, we propose a Multi-modal Collaborative Prompt Learning (MCPL) pipeline to tune a frozen V-L model for aligning medical text-image representations, thereby achieving medical downstream tasks. We first construct the anatomy-pathology (AP) prompt for multi-modal prompting jointly with text and image prompts. The AP prompt introduces instance-level anatomy and pathology information, thereby making a V-L model better comprehend medical reports and images. Next, we propose graph-guided prompt collaboration module (GPCM), which explicitly establishes multi-way couplings between the AP, text, and image prompts, enabling collaborative multi-modal prompt producing and updating for more effective prompting. Finally, we develop a novel prompt configuration scheme, which attaches the AP prompt to the query and key, and the text/image prompt to the value in self-attention layers for improving the interpretability of multi-modal prompts. Extensive experiments on numerous medical classification and object detection datasets show that the proposed pipeline achieves excellent effectiveness and generalization. Compared with state-of-the-art prompt learning methods, MCPL provides a more reliable multi-modal prompt paradigm for reducing tuning costs of V-L models on medical downstream tasks. Our code: https://github.com/CUHK-AIM-Group/MCPL.
{"title":"MCPL: Multi-modal Collaborative Prompt Learning for Medical Vision-Language Model.","authors":"Pengyu Wang, Huaqi Zhang, Yixuan Yuan","doi":"10.1109/TMI.2024.3418408","DOIUrl":"10.1109/TMI.2024.3418408","url":null,"abstract":"<p><p>Multi-modal prompt learning is a high-performance and cost-effective learning paradigm, which learns text as well as image prompts to tune pre-trained vision-language (V-L) models like CLIP for adapting multiple downstream tasks. However, recent methods typically treat text and image prompts as independent components without considering the dependency between prompts. Moreover, extending multi-modal prompt learning into the medical field poses challenges due to a significant gap between general- and medical-domain data. To this end, we propose a Multi-modal Collaborative Prompt Learning (MCPL) pipeline to tune a frozen V-L model for aligning medical text-image representations, thereby achieving medical downstream tasks. We first construct the anatomy-pathology (AP) prompt for multi-modal prompting jointly with text and image prompts. The AP prompt introduces instance-level anatomy and pathology information, thereby making a V-L model better comprehend medical reports and images. Next, we propose graph-guided prompt collaboration module (GPCM), which explicitly establishes multi-way couplings between the AP, text, and image prompts, enabling collaborative multi-modal prompt producing and updating for more effective prompting. Finally, we develop a novel prompt configuration scheme, which attaches the AP prompt to the query and key, and the text/image prompt to the value in self-attention layers for improving the interpretability of multi-modal prompts. Extensive experiments on numerous medical classification and object detection datasets show that the proposed pipeline achieves excellent effectiveness and generalization. Compared with state-of-the-art prompt learning methods, MCPL provides a more reliable multi-modal prompt paradigm for reducing tuning costs of V-L models on medical downstream tasks. Our code: https://github.com/CUHK-AIM-Group/MCPL.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141447883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-24DOI: 10.1109/TMI.2024.3418838
Yanyang Wang;Zirong Li;Weiwen Wu
The score-based generative model (SGM) has received significant attention in the field of medical imaging, particularly in the context of limited-angle computed tomography (LACT). Traditional SGM approaches achieved robust reconstruction performance by incorporating a substantial number of sampling steps during the inference phase. However, these established SGM-based methods require large computational cost to reconstruct one case. The main challenge lies in achieving high-quality images with rapid sampling while preserving sharp edges and small features. In this study, we propose an innovative rapid-sampling strategy for SGM, which we have aptly named the time-reversion fast-sampling (TIFA) score-based model for LACT reconstruction. The entire sampling procedure adheres steadfastly to the principles of robust optimization theory and is firmly grounded in a comprehensive mathematical model. TIFA’s rapid-sampling mechanism comprises several essential components, including jump sampling, time-reversion with re-sampling, and compressed sampling. In the initial jump sampling stage, multiple sampling steps are bypassed to expedite the attainment of preliminary results. Subsequently, during the time-reversion process, the initial results undergo controlled corruption by introducing small-scale noise. The re-sampling process then diligently refines the initially corrupted results. Finally, compressed sampling fine-tunes the refinement outcomes by imposing regularization term. Quantitative and qualitative assessments conducted on numerical simulations, real physical phantom, and clinical cardiac datasets, unequivocally demonstrate that TIFA method (using 200 steps) outperforms other state-of-the-art methods (using 2000 steps) from available [0°, 90°] and [0°, 60°]. Furthermore, experimental results underscore that our TIFA method continues to reconstruct high-quality images even with 10 steps. Our code at https://github.com/tianzhijiaoziA/TIFADiffusion