Due to the Transformer's ability to capture long-range dependencies through Self-Attention, it has shown immense potential in medical image segmentation. However, it lacks the capability to model local relationships between pixels. Therefore, many previous approaches embedded the Transformer into the CNN encoder. However, current methods often fall short in modeling the relationships between multi-scale features, specifically the spatial correspondence between features at different scales. This limitation can result in the ineffective capture of scale differences for each object and the loss of features for small targets. Furthermore, due to the high complexity of the Transformer, it is challenging to integrate local and global information within the same scale effectively. To address these limitations, we propose a novel backbone network called CasUNeXt, which features three appealing design elements: (1) We use the idea of cascade to redesign the way CNN and Transformer are combined to enhance modeling the unique interrelationships between multi-scale information. (2) We design a Cascaded Scale-wise Transformer Module capable of cross-scale interactions. It not only strengthens feature extraction within a single scale but also models interactions between different scales. (3) We overhaul the multi-head Channel Attention mechanism to enable it to model context information in feature maps from multiple perspectives within the channel dimension. These design features collectively enable CasUNeXt to better integrate local and global information and capture relationships between multi-scale features, thereby improving the performance of medical image segmentation. Through experimental comparisons on various benchmark datasets, our CasUNeXt method exhibits outstanding performance in medical image segmentation tasks, surpassing the current state-of-the-art methods.
{"title":"CasUNeXt: A Cascaded Transformer With Intra- and Inter-Scale Information for Medical Image Segmentation","authors":"Junding Sun, Xiaopeng Zheng, Xiaosheng Wu, Chaosheng Tang, Shuihua Wang, Yudong Zhang","doi":"10.1002/ima.23184","DOIUrl":"https://doi.org/10.1002/ima.23184","url":null,"abstract":"<p>Due to the Transformer's ability to capture long-range dependencies through Self-Attention, it has shown immense potential in medical image segmentation. However, it lacks the capability to model local relationships between pixels. Therefore, many previous approaches embedded the Transformer into the CNN encoder. However, current methods often fall short in modeling the relationships between multi-scale features, specifically the spatial correspondence between features at different scales. This limitation can result in the ineffective capture of scale differences for each object and the loss of features for small targets. Furthermore, due to the high complexity of the Transformer, it is challenging to integrate local and global information within the same scale effectively. To address these limitations, we propose a novel backbone network called CasUNeXt, which features three appealing design elements: (1) We use the idea of cascade to redesign the way CNN and Transformer are combined to enhance modeling the unique interrelationships between multi-scale information. (2) We design a Cascaded Scale-wise Transformer Module capable of cross-scale interactions. It not only strengthens feature extraction within a single scale but also models interactions between different scales. (3) We overhaul the multi-head Channel Attention mechanism to enable it to model context information in feature maps from multiple perspectives within the channel dimension. These design features collectively enable CasUNeXt to better integrate local and global information and capture relationships between multi-scale features, thereby improving the performance of medical image segmentation. Through experimental comparisons on various benchmark datasets, our CasUNeXt method exhibits outstanding performance in medical image segmentation tasks, surpassing the current state-of-the-art methods.</p>","PeriodicalId":14027,"journal":{"name":"International Journal of Imaging Systems and Technology","volume":"34 5","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ima.23184","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142276596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hüseyin Firat, Hüseyin Üzen, Davut Hanbay, Abdulkadir Şengür
Histopathology, vital in diagnosing medical conditions, especially in cancer research, relies on analyzing histopathology images (HIs). Nuclei segmentation, a key task, involves precisely identifying cell nuclei boundaries. Manual segmentation by pathologists is time-consuming, prompting the need for robust automated methods. Challenges in segmentation arise from HI complexities, necessitating advanced techniques. Recent advancements in deep learning, particularly Convolutional Neural Networks (CNNs), have transformed nuclei segmentation. This study emphasizes feature extraction, introducing the ConvNext Mixer-based Encoder-Decoder (CNM-ED) model. Unlike traditional CNN based models, the proposed CNM-ED model enables the extraction of spatial and long context features to address the inherent complexities of histopathology images. This method leverages a multi-path strategy using a traditional CNN architecture as well as different paths focused on obtaining customized long context features using the ConvNext Mixer block structure that combines ConvMixer and ConvNext blocks. The fusion of these diverse features in the final segmentation output enables improved accuracy and performance, surpassing existing state-of-the-art segmentation models. Moreover, our multi-level feature extraction strategy is more effective than models using self-attention mechanisms such as SwinUnet and TransUnet, which have been frequently used in recent years. Experimental studies were conducted using five different datasets (TNBC, MoNuSeg, CoNSeP, CPM17, and CryoNuSeg) to analyze the performance of the proposed CNM-ED model. Comparisons were made with various CNN based models in the literature using evaluation metrics such as accuracy, AJI, macro F1 score, macro intersection over union, macro precision, and macro recall. It was observed that the proposed CNM-ED model achieved highly successful results across all metrics. Through comparisons with state-art-of models from the literature, the proposed CNM-ED model stands out as a promising advancement in nuclei segmentation, addressing the intricacies of histopathological images. The model demonstrates enhanced diagnostic capabilities and holds the potential for significant progress in medical research.
{"title":"ConvNext Mixer-Based Encoder Decoder Method for Nuclei Segmentation in Histopathology Images","authors":"Hüseyin Firat, Hüseyin Üzen, Davut Hanbay, Abdulkadir Şengür","doi":"10.1002/ima.23181","DOIUrl":"https://doi.org/10.1002/ima.23181","url":null,"abstract":"<p>Histopathology, vital in diagnosing medical conditions, especially in cancer research, relies on analyzing histopathology images (HIs). Nuclei segmentation, a key task, involves precisely identifying cell nuclei boundaries. Manual segmentation by pathologists is time-consuming, prompting the need for robust automated methods. Challenges in segmentation arise from HI complexities, necessitating advanced techniques. Recent advancements in deep learning, particularly Convolutional Neural Networks (CNNs), have transformed nuclei segmentation. This study emphasizes feature extraction, introducing the ConvNext Mixer-based Encoder-Decoder (CNM-ED) model. Unlike traditional CNN based models, the proposed CNM-ED model enables the extraction of spatial and long context features to address the inherent complexities of histopathology images. This method leverages a multi-path strategy using a traditional CNN architecture as well as different paths focused on obtaining customized long context features using the ConvNext Mixer block structure that combines ConvMixer and ConvNext blocks. The fusion of these diverse features in the final segmentation output enables improved accuracy and performance, surpassing existing state-of-the-art segmentation models. Moreover, our multi-level feature extraction strategy is more effective than models using self-attention mechanisms such as SwinUnet and TransUnet, which have been frequently used in recent years. Experimental studies were conducted using five different datasets (TNBC, MoNuSeg, CoNSeP, CPM17, and CryoNuSeg) to analyze the performance of the proposed CNM-ED model. Comparisons were made with various CNN based models in the literature using evaluation metrics such as accuracy, AJI, macro F1 score, macro intersection over union, macro precision, and macro recall. It was observed that the proposed CNM-ED model achieved highly successful results across all metrics. Through comparisons with state-art-of models from the literature, the proposed CNM-ED model stands out as a promising advancement in nuclei segmentation, addressing the intricacies of histopathological images. The model demonstrates enhanced diagnostic capabilities and holds the potential for significant progress in medical research.</p>","PeriodicalId":14027,"journal":{"name":"International Journal of Imaging Systems and Technology","volume":"34 5","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ima.23181","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142276595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zengan Huang, Shan Gao, Xiaxia Yu, Liangjia Zhu, Yi Gao
Recent advancements in deformable image registration (DIR) have seen the emergence of supervised and unsupervised deep learning techniques. However, supervised methods are limited by the quality of deformation vector fields (DVFs), while unsupervised approaches often yield suboptimal results due to their reliance on indirect dissimilarity metrics. Moreover, both methods struggle to effectively model long-range dependencies. This study proposes a novel DIR method that integrates the advantages of supervised and unsupervised learning and tackle issues related to long-range dependencies, thereby improving registration results. Specifically, we propose a DVF generation diffusion model to enhance DVFs diversity, which could be used to facilitate the integration of supervised and unsupervised learning approaches. This fusion allows the method to leverage the benefits of both paradigms. Furthermore, a multi-scale frequency-weighted denoising module is integrated to enhance DVFs generation quality and improve the registration accuracy. Additionally, we propose a novel MambaReg network that adeptly manages long-range dependencies, further optimizing registration outcomes. Experimental evaluation of four public data sets demonstrates that our method outperforms several state-of-the-art techniques based on either supervised or unsupervised learning. Qualitative and quantitative comparisons highlight the superior performance of our approach.
{"title":"Enhanced Deformation Vector Field Generation With Diffusion Models and Mamba-Based Network for Registration Performance Enhancement","authors":"Zengan Huang, Shan Gao, Xiaxia Yu, Liangjia Zhu, Yi Gao","doi":"10.1002/ima.23171","DOIUrl":"https://doi.org/10.1002/ima.23171","url":null,"abstract":"<p>Recent advancements in deformable image registration (DIR) have seen the emergence of supervised and unsupervised deep learning techniques. However, supervised methods are limited by the quality of deformation vector fields (DVFs), while unsupervised approaches often yield suboptimal results due to their reliance on indirect dissimilarity metrics. Moreover, both methods struggle to effectively model long-range dependencies. This study proposes a novel DIR method that integrates the advantages of supervised and unsupervised learning and tackle issues related to long-range dependencies, thereby improving registration results. Specifically, we propose a DVF generation diffusion model to enhance DVFs diversity, which could be used to facilitate the integration of supervised and unsupervised learning approaches. This fusion allows the method to leverage the benefits of both paradigms. Furthermore, a multi-scale frequency-weighted denoising module is integrated to enhance DVFs generation quality and improve the registration accuracy. Additionally, we propose a novel MambaReg network that adeptly manages long-range dependencies, further optimizing registration outcomes. Experimental evaluation of four public data sets demonstrates that our method outperforms several state-of-the-art techniques based on either supervised or unsupervised learning. Qualitative and quantitative comparisons highlight the superior performance of our approach.</p>","PeriodicalId":14027,"journal":{"name":"International Journal of Imaging Systems and Technology","volume":"34 5","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ima.23171","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142273012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Skin cancer is a significant public health issue, making accurate and early diagnosis crucial. This study proposes a novel and efficient hybrid deep-learning model for accurate skin cancer diagnosis. The model first employs DeepLabV3+ for precise segmentation of skin lesions in dermoscopic images. Feature extraction is then carried out using three pretrained models: MobileNetV2, EfficientNetB0, and DenseNet201 to ensure balanced performance and robust feature learning. These extracted features are then concatenated, and the ReliefF algorithm is employed to select the most relevant features. Finally, obtained features are classified into eight categories: actinic keratosis, basal cell carcinoma, benign keratosis, dermatofibroma, melanoma, melanocytic nevus, squamous cell carcinoma, and vascular lesion using the kNN algorithm. The proposed model achieves an F1 score of 93.49% and an accuracy of 94.42% on the ISIC-2019 dataset, surpassing the best individual model, EfficientNetB0, by 1.20%. Furthermore, the evaluation of the PH2 dataset yielded an F1 score of 94.43% and an accuracy of 94.44%, confirming its generalizability. These findings signify the potential of the proposed model as an expedient, accurate, and valuable tool for early skin cancer detection. They also indicate combining different CNN models achieves superior results over the results obtained from individual models.
{"title":"A Hybrid Convolutional Neural Network Model for the Classification of Multi-Class Skin Cancer","authors":"Ahmet Nusret Toprak, Ibrahim Aruk","doi":"10.1002/ima.23180","DOIUrl":"https://doi.org/10.1002/ima.23180","url":null,"abstract":"<p>Skin cancer is a significant public health issue, making accurate and early diagnosis crucial. This study proposes a novel and efficient hybrid deep-learning model for accurate skin cancer diagnosis. The model first employs DeepLabV3+ for precise segmentation of skin lesions in dermoscopic images. Feature extraction is then carried out using three pretrained models: MobileNetV2, EfficientNetB0, and DenseNet201 to ensure balanced performance and robust feature learning. These extracted features are then concatenated, and the ReliefF algorithm is employed to select the most relevant features. Finally, obtained features are classified into eight categories: actinic keratosis, basal cell carcinoma, benign keratosis, dermatofibroma, melanoma, melanocytic nevus, squamous cell carcinoma, and vascular lesion using the kNN algorithm. The proposed model achieves an <i>F</i>1 score of 93.49% and an accuracy of 94.42% on the ISIC-2019 dataset, surpassing the best individual model, EfficientNetB0, by 1.20%. Furthermore, the evaluation of the PH2 dataset yielded an <i>F</i>1 score of 94.43% and an accuracy of 94.44%, confirming its generalizability. These findings signify the potential of the proposed model as an expedient, accurate, and valuable tool for early skin cancer detection. They also indicate combining different CNN models achieves superior results over the results obtained from individual models.</p>","PeriodicalId":14027,"journal":{"name":"International Journal of Imaging Systems and Technology","volume":"34 5","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ima.23180","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142273010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}