首页 > 最新文献

International Journal of Imaging Systems and Technology最新文献

英文 中文
RetNet30: A Novel Stacked Convolution Neural Network Model for Automated Retinal Disease Diagnosis RetNet30:用于视网膜疾病自动诊断的新型堆积卷积神经网络模型
IF 3 4区 计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-09-25 DOI: 10.1002/ima.23187
Krishnakumar Subramaniam, Archana Naganathan

Automated diagnosis of retinal diseases holds significant promise in enhancing healthcare efficiency and patient outcomes. However, existing methods often lack the accuracy and efficiency required for timely disease detection. To address this gap, we introduce RetNet30, a novel stacked convolutional neural network (CNN) designed to revolutionize automated retinal disease diagnosis. RetNet30 combines a custom-built 30-layer CNN with a fine-tuned Inception V3 model, integrating these sub-models through logistic regression to achieve superior classification performance. Extensive evaluations on retinal image datasets such as DRIVE, STARE, CHASE_DB1, and HRF demonstrate significant improvements in accuracy, sensitivity, specificity, and area under the ROC curve (AUROC) when compared to conventional approaches. By leveraging advanced deep learning architectures, RetNet30 not only enhances diagnostic precision but also generalizes effectively across diverse datasets, establishing a new benchmark in retinal disease classification. This novel approach offers a highly efficient and reliable solution for early disease detection and patient management, addressing the limitations of manual examination methods. Through rigorous quantitative and qualitative assessments, our proposed method demonstrates its potential to significantly impact medical image analysis and improve healthcare outcomes. RetNet30 marks a major step forward in automated retinal disease diagnosis, showcasing the future of AI-driven advancements in ophthalmology.

视网膜疾病的自动诊断在提高医疗效率和改善患者治疗效果方面大有可为。然而,现有方法往往缺乏及时检测疾病所需的准确性和效率。为了弥补这一不足,我们推出了 RetNet30,这是一种新型的堆叠卷积神经网络(CNN),旨在彻底改变视网膜疾病的自动诊断。RetNet30 将定制的 30 层卷积神经网络与微调的 Inception V3 模型相结合,通过逻辑回归整合这些子模型,从而实现卓越的分类性能。在 DRIVE、STARE、CHASE_DB1 和 HRF 等视网膜图像数据集上进行的广泛评估表明,与传统方法相比,该技术在准确性、灵敏度、特异性和 ROC 曲线下面积 (AUROC) 方面都有显著提高。通过利用先进的深度学习架构,RetNet30 不仅提高了诊断精确度,还能在不同的数据集上有效泛化,为视网膜疾病分类树立了新的标杆。这种新方法为早期疾病检测和患者管理提供了高效可靠的解决方案,解决了人工检查方法的局限性。通过严格的定量和定性评估,我们提出的方法证明了其在显著影响医学图像分析和改善医疗效果方面的潜力。RetNet30 标志着视网膜疾病自动诊断向前迈出了一大步,展示了人工智能驱动的眼科进步的未来。
{"title":"RetNet30: A Novel Stacked Convolution Neural Network Model for Automated Retinal Disease Diagnosis","authors":"Krishnakumar Subramaniam,&nbsp;Archana Naganathan","doi":"10.1002/ima.23187","DOIUrl":"https://doi.org/10.1002/ima.23187","url":null,"abstract":"<div>\u0000 \u0000 <p>Automated diagnosis of retinal diseases holds significant promise in enhancing healthcare efficiency and patient outcomes. However, existing methods often lack the accuracy and efficiency required for timely disease detection. To address this gap, we introduce RetNet30, a novel stacked convolutional neural network (CNN) designed to revolutionize automated retinal disease diagnosis. RetNet30 combines a custom-built 30-layer CNN with a fine-tuned Inception V3 model, integrating these sub-models through logistic regression to achieve superior classification performance. Extensive evaluations on retinal image datasets such as DRIVE, STARE, CHASE_DB1, and HRF demonstrate significant improvements in accuracy, sensitivity, specificity, and area under the ROC curve (AUROC) when compared to conventional approaches. By leveraging advanced deep learning architectures, RetNet30 not only enhances diagnostic precision but also generalizes effectively across diverse datasets, establishing a new benchmark in retinal disease classification. This novel approach offers a highly efficient and reliable solution for early disease detection and patient management, addressing the limitations of manual examination methods. Through rigorous quantitative and qualitative assessments, our proposed method demonstrates its potential to significantly impact medical image analysis and improve healthcare outcomes. RetNet30 marks a major step forward in automated retinal disease diagnosis, showcasing the future of AI-driven advancements in ophthalmology.</p>\u0000 </div>","PeriodicalId":14027,"journal":{"name":"International Journal of Imaging Systems and Technology","volume":"34 5","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142316971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-Layer Connection SegFormer Attention U-Net for Efficient TRUS Image Segmentation 跨层连接 SegFormer 关注 U-Net 实现高效 TRUS 图像分割
IF 3 4区 计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-09-24 DOI: 10.1002/ima.23178
Yongtao Shi, Wei Du, Chao Gao, Xinzhi Li

Accurately and rapidly segmenting the prostate in transrectal ultrasound (TRUS) images remains challenging due to the complex semantic information in ultrasound images. The paper discusses a cross-layer connection with SegFormer attention U-Net for efficient TRUS image segmentation. The SegFormer framework is enhanced by reducing model parameters and complexity without sacrificing accuracy. We introduce layer-skipping connections for precise positioning and combine local context with global dependency for superior feature recognition. The decoder is improved with Multi-layer Perceptual Convolutional Block Attention Module (MCBAM) for better upsampling and reduced information loss, leading to increased accuracy. The experimental results show that compared with classic or popular deep learning methods, this method has better segmentation performance, with the dice similarity coefficient (DSC) of 97.55% and the intersection over union (IoU) of 95.23%. This approach balances encoder efficiency, multi-layer information flow, and parameter reduction.

由于超声图像中的语义信息非常复杂,因此准确、快速地分割经直肠超声(TRUS)图像中的前列腺仍是一项挑战。本文讨论了跨层连接 SegFormer 注意力 U-Net,以实现高效 TRUS 图像分割。SegFormer 框架通过降低模型参数和复杂度而不牺牲准确性得到了增强。我们引入了层跳连接以实现精确定位,并将局部上下文与全局依赖性相结合,从而实现卓越的特征识别。解码器采用多层感知卷积块注意力模块(MCBAM)进行改进,以实现更好的上采样并减少信息丢失,从而提高准确性。实验结果表明,与经典或流行的深度学习方法相比,该方法具有更好的分割性能,骰子相似系数(DSC)为 97.55%,交集大于联合(IoU)为 95.23%。这种方法兼顾了编码器效率、多层信息流和参数缩减。
{"title":"Cross-Layer Connection SegFormer Attention U-Net for Efficient TRUS Image Segmentation","authors":"Yongtao Shi,&nbsp;Wei Du,&nbsp;Chao Gao,&nbsp;Xinzhi Li","doi":"10.1002/ima.23178","DOIUrl":"https://doi.org/10.1002/ima.23178","url":null,"abstract":"<div>\u0000 \u0000 <p>Accurately and rapidly segmenting the prostate in transrectal ultrasound (TRUS) images remains challenging due to the complex semantic information in ultrasound images. The paper discusses a cross-layer connection with SegFormer attention U-Net for efficient TRUS image segmentation. The SegFormer framework is enhanced by reducing model parameters and complexity without sacrificing accuracy. We introduce layer-skipping connections for precise positioning and combine local context with global dependency for superior feature recognition. The decoder is improved with Multi-layer Perceptual Convolutional Block Attention Module (MCBAM) for better upsampling and reduced information loss, leading to increased accuracy. The experimental results show that compared with classic or popular deep learning methods, this method has better segmentation performance, with the dice similarity coefficient (DSC) of 97.55% and the intersection over union (IoU) of 95.23%. This approach balances encoder efficiency, multi-layer information flow, and parameter reduction.</p>\u0000 </div>","PeriodicalId":14027,"journal":{"name":"International Journal of Imaging Systems and Technology","volume":"34 5","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142316942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Revolutionizing Colon Histopathology Glandular Segmentation Using an Ensemble Network With Watershed Algorithm 利用带分水岭算法的集合网络革新结肠组织病理学腺体分割技术
IF 3 4区 计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-09-24 DOI: 10.1002/ima.23179
Bijoyeta Roy, Mousumi Gupta, Bidyut Krishna Goswami

Colorectal adenocarcinoma, the most prevalent form of colon cancer, originates in the glandular structures of the intestines, presenting histopathological abnormalities in affected tissues. Accurate gland segmentation is crucial for identifying these potentially fatal abnormalities. While recent methodologies have shown success in segmenting glands in benign tissues, their efficacy diminishes when applied to malignant tissue segmentation. This study aims to develop a robust learning algorithm using a convolutional neural network (CNN) to segment glandular structures in colon histology images. The methodology employs a CNN based on the U-Net architecture, augmented by a weighted ensemble network that integrates DenseNet 169, Inception V3, and Efficientnet B3 as backbone models. Additionally, the segmented gland boundaries are refined using the watershed algorithm. Evaluation on the Warwick-QU dataset demonstrates promising results for the ensemble model, by achieving an F1 score of 0.928 and 0.913, object dice coefficient of 0.923 and 0.911, and Hausdorff distances of 38.97 and 33.76 on test sets A and B, respectively. These results are compared with outcomes from the GlaS challenge (MICCAI 2015) and existing research findings. Furthermore, our model is validated with a publicly available dataset named LC25000, and visual inspection reveals promising results, further validating the efficacy of our approach. The proposed ensemble methodology underscores the advantages of amalgamating diverse models, highlighting the potential of ensemble techniques to enhance segmentation tasks beyond individual model capabilities.

大肠腺癌是最常见的结肠癌,起源于肠道的腺体结构,受影响的组织会出现组织病理学异常。准确的腺体分割对于识别这些潜在的致命异常至关重要。虽然最近的方法在良性组织的腺体分割方面取得了成功,但在应用于恶性组织分割时,其效果却大打折扣。本研究旨在利用卷积神经网络(CNN)开发一种稳健的学习算法,以分割结肠组织学图像中的腺体结构。该方法采用了基于 U-Net 架构的 CNN,并以 DenseNet 169、Inception V3 和 Efficientnet B3 为骨干模型的加权集合网络作为增强。此外,还使用分水岭算法对分割的腺体边界进行了细化。在 Warwick-QU 数据集上的评估结果表明,该集合模型取得了良好的效果,在测试集 A 和 B 上的 F1 分数分别为 0.928 和 0.913,对象骰子系数分别为 0.923 和 0.911,豪斯多夫距离分别为 38.97 和 33.76。这些结果与 GlaS 挑战赛(MICCAI 2015)的结果和现有研究成果进行了比较。此外,我们的模型还通过名为 LC25000 的公开数据集进行了验证,目测结果令人满意,进一步验证了我们方法的有效性。所提出的集合方法强调了合并不同模型的优势,突出了集合技术在增强分割任务方面的潜力,超越了单个模型的能力。
{"title":"Revolutionizing Colon Histopathology Glandular Segmentation Using an Ensemble Network With Watershed Algorithm","authors":"Bijoyeta Roy,&nbsp;Mousumi Gupta,&nbsp;Bidyut Krishna Goswami","doi":"10.1002/ima.23179","DOIUrl":"https://doi.org/10.1002/ima.23179","url":null,"abstract":"<div>\u0000 \u0000 <p>Colorectal adenocarcinoma, the most prevalent form of colon cancer, originates in the glandular structures of the intestines, presenting histopathological abnormalities in affected tissues. Accurate gland segmentation is crucial for identifying these potentially fatal abnormalities. While recent methodologies have shown success in segmenting glands in benign tissues, their efficacy diminishes when applied to malignant tissue segmentation. This study aims to develop a robust learning algorithm using a convolutional neural network (CNN) to segment glandular structures in colon histology images. The methodology employs a CNN based on the U-Net architecture, augmented by a weighted ensemble network that integrates DenseNet 169, Inception V3, and Efficientnet B3 as backbone models. Additionally, the segmented gland boundaries are refined using the watershed algorithm. Evaluation on the Warwick-QU dataset demonstrates promising results for the ensemble model, by achieving an F1 score of 0.928 and 0.913, object dice coefficient of 0.923 and 0.911, and Hausdorff distances of 38.97 and 33.76 on test sets A and B, respectively. These results are compared with outcomes from the GlaS challenge (MICCAI 2015) and existing research findings. Furthermore, our model is validated with a publicly available dataset named LC25000, and visual inspection reveals promising results, further validating the efficacy of our approach. The proposed ensemble methodology underscores the advantages of amalgamating diverse models, highlighting the potential of ensemble techniques to enhance segmentation tasks beyond individual model capabilities.</p>\u0000 </div>","PeriodicalId":14027,"journal":{"name":"International Journal of Imaging Systems and Technology","volume":"34 5","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142316832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancement of Semantic Segmentation by Image-Level Fine-Tuning to Overcome Image Pattern Imbalance in HRCT of Diffuse Infiltrative Lung Diseases 通过图像级微调增强语义分割,克服弥漫性浸润性肺病 HRCT 图像模式失衡问题
IF 3 4区 计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-09-24 DOI: 10.1002/ima.23188
Sungwon Ham, Beomhee Park, Jihye Yun, Sang Min Lee, Joon Beom Seo, Namkug Kim

Diagnosing diffuse infiltrative lung diseases (DILD) using high-resolution computed tomography (HRCT) is challenging, even for expert radiologists, due to the complex and variable image patterns. Moreover, the imbalances among the six key DILD-related patterns—normal, ground-glass opacity, reticular opacity, honeycombing, emphysema, and consolidation—further complicate accurate segmentation and diagnosis. This study presents an enhanced U-Net-based segmentation technique aimed at addressing these challenges. The primary contribution of our work is the fine-tuning of the U-Net model using image-level labels from 92 HRCT images that include various types of DILDs, such as cryptogenic organizing pneumonia, usual interstitial pneumonia, and nonspecific interstitial pneumonia. This approach helps to correct the imbalance among image patterns, improving the model's ability to accurately differentiate between them. By employing semantic lung segmentation and patch-level machine learning, the fine-tuned model demonstrated improved agreement with radiologists' evaluations compared to conventional methods. This suggests a significant enhancement in both segmentation accuracy and inter-observer consistency. In conclusion, the fine-tuned U-Net model offers a more reliable tool for HRCT image segmentation, making it a valuable imaging biomarker for guiding treatment decisions in patients with DILD. By addressing the issue of pattern imbalances, our model significantly improves the accuracy of DILD diagnosis, which is crucial for effective patient care.

由于图像模式复杂多变,使用高分辨率计算机断层扫描(HRCT)诊断弥漫性浸润性肺病(DILD)即使对放射科专家来说也是一项挑战。此外,与 DILD 相关的六种主要模式--正常、磨玻璃不透明、网状不透明、蜂窝状、肺气肿和合并--之间的不平衡使准确的分割和诊断更加复杂。本研究提出了一种基于 U-Net 的增强型分割技术,旨在应对这些挑战。我们工作的主要贡献在于利用 92 张 HRCT 图像的图像级标签对 U-Net 模型进行了微调,这些图像包括各种类型的 DILD,如隐源性组织性肺炎、常见间质性肺炎和非特异性间质性肺炎。这种方法有助于纠正图像模式之间的不平衡,提高模型准确区分这些模式的能力。通过采用语义肺分割和斑块级机器学习,与传统方法相比,微调模型与放射科医生的评估结果一致性更高。这表明分割准确性和观察者之间的一致性都得到了显著提高。总之,经过微调的 U-Net 模型为 HRCT 图像分割提供了更可靠的工具,使其成为指导 DILD 患者治疗决策的重要成像生物标志物。通过解决模式失衡问题,我们的模型大大提高了 DILD 诊断的准确性,这对有效治疗患者至关重要。
{"title":"Enhancement of Semantic Segmentation by Image-Level Fine-Tuning to Overcome Image Pattern Imbalance in HRCT of Diffuse Infiltrative Lung Diseases","authors":"Sungwon Ham,&nbsp;Beomhee Park,&nbsp;Jihye Yun,&nbsp;Sang Min Lee,&nbsp;Joon Beom Seo,&nbsp;Namkug Kim","doi":"10.1002/ima.23188","DOIUrl":"https://doi.org/10.1002/ima.23188","url":null,"abstract":"<div>\u0000 \u0000 <p>Diagnosing diffuse infiltrative lung diseases (DILD) using high-resolution computed tomography (HRCT) is challenging, even for expert radiologists, due to the complex and variable image patterns. Moreover, the imbalances among the six key DILD-related patterns—normal, ground-glass opacity, reticular opacity, honeycombing, emphysema, and consolidation—further complicate accurate segmentation and diagnosis. This study presents an enhanced U-Net-based segmentation technique aimed at addressing these challenges. The primary contribution of our work is the fine-tuning of the U-Net model using image-level labels from 92 HRCT images that include various types of DILDs, such as cryptogenic organizing pneumonia, usual interstitial pneumonia, and nonspecific interstitial pneumonia. This approach helps to correct the imbalance among image patterns, improving the model's ability to accurately differentiate between them. By employing semantic lung segmentation and patch-level machine learning, the fine-tuned model demonstrated improved agreement with radiologists' evaluations compared to conventional methods. This suggests a significant enhancement in both segmentation accuracy and inter-observer consistency. In conclusion, the fine-tuned U-Net model offers a more reliable tool for HRCT image segmentation, making it a valuable imaging biomarker for guiding treatment decisions in patients with DILD. By addressing the issue of pattern imbalances, our model significantly improves the accuracy of DILD diagnosis, which is crucial for effective patient care.</p>\u0000 </div>","PeriodicalId":14027,"journal":{"name":"International Journal of Imaging Systems and Technology","volume":"34 5","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142316940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CafeNet: A Novel Multi-Scale Context Aggregation and Multi-Level Foreground Enhancement Network for Polyp Segmentation CafeNet:用于息肉分割的新型多尺度上下文聚合和多层次前景增强网络
IF 3 4区 计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-09-24 DOI: 10.1002/ima.23183
Zhanlin Ji, Xiaoyu Li, Zhiwu Wang, Haiyang Zhang, Na Yuan, Xueji Zhang, Ivan Ganchev

The detection of polyps plays a significant role in colonoscopy examinations, cancer diagnosis, and early patient treatment. However, due to the diversity in the size, color, and shape of polyps, as well as the presence of low image contrast with the surrounding mucosa and fuzzy boundaries, precise polyp segmentation remains a challenging task. Furthermore, this task requires excellent real-time performance to promptly and efficiently present predictive results to doctors during colonoscopy examinations. To address these challenges, a novel neural network, called CafeNet, is proposed in this paper for rapid and accurate polyp segmentation. CafeNet utilizes newly designed multi-scale context aggregation (MCA) modules to adapt to the extensive variations in polyp morphology, covering small to large polyps by fusing simplified global contextual information and local information at different scales. Additionally, the proposed network utilizes newly designed multi-level foreground enhancement (MFE) modules to compute and extract differential features between adjacent layers and uses the prediction output from the adjacent lower-layer decoder as a guidance map to enhance the polyp information extracted by the upper-layer encoder, thereby improving the contrast between polyps and the background. The polyp segmentation performance of the proposed CafeNet network is evaluated on five benchmark public datasets using six evaluation metrics. Experimental results indicate that CafeNet outperforms the state-of-the-art networks, while also exhibiting the least parameter count along with excellent real-time operational speed.

息肉检测在结肠镜检查、癌症诊断和早期患者治疗中发挥着重要作用。然而,由于息肉的大小、颜色和形状多种多样,而且与周围粘膜的图像对比度低,边界模糊,因此息肉的精确分割仍然是一项具有挑战性的任务。此外,这项任务还需要出色的实时性能,以便在结肠镜检查过程中及时有效地向医生提供预测结果。为了应对这些挑战,本文提出了一种名为 CafeNet 的新型神经网络,用于快速准确地分割息肉。CafeNet 利用新设计的多尺度上下文聚合(MCA)模块来适应息肉形态的广泛变化,通过融合简化的全局上下文信息和不同尺度的局部信息来覆盖从小到大的息肉。此外,该网络还利用新设计的多层次前景增强(MFE)模块计算和提取相邻层之间的差异特征,并将相邻下层解码器的预测输出作为引导图,增强上层编码器提取的息肉信息,从而提高息肉与背景之间的对比度。在五个基准公共数据集上,使用六个评估指标对所提出的 CafeNet 网络的息肉分割性能进行了评估。实验结果表明,CafeNet 的性能优于最先进的网络,同时参数数量最少,实时运行速度极快。
{"title":"CafeNet: A Novel Multi-Scale Context Aggregation and Multi-Level Foreground Enhancement Network for Polyp Segmentation","authors":"Zhanlin Ji,&nbsp;Xiaoyu Li,&nbsp;Zhiwu Wang,&nbsp;Haiyang Zhang,&nbsp;Na Yuan,&nbsp;Xueji Zhang,&nbsp;Ivan Ganchev","doi":"10.1002/ima.23183","DOIUrl":"https://doi.org/10.1002/ima.23183","url":null,"abstract":"<p>The detection of polyps plays a significant role in colonoscopy examinations, cancer diagnosis, and early patient treatment. However, due to the diversity in the size, color, and shape of polyps, as well as the presence of low image contrast with the surrounding mucosa and fuzzy boundaries, precise polyp segmentation remains a challenging task. Furthermore, this task requires excellent real-time performance to promptly and efficiently present predictive results to doctors during colonoscopy examinations. To address these challenges, a novel neural network, called CafeNet, is proposed in this paper for rapid and accurate polyp segmentation. CafeNet utilizes newly designed multi-scale context aggregation (MCA) modules to adapt to the extensive variations in polyp morphology, covering small to large polyps by fusing simplified global contextual information and local information at different scales. Additionally, the proposed network utilizes newly designed multi-level foreground enhancement (MFE) modules to compute and extract differential features between adjacent layers and uses the prediction output from the adjacent lower-layer decoder as a guidance map to enhance the polyp information extracted by the upper-layer encoder, thereby improving the contrast between polyps and the background. The polyp segmentation performance of the proposed CafeNet network is evaluated on five benchmark public datasets using six evaluation metrics. Experimental results indicate that CafeNet outperforms the state-of-the-art networks, while also exhibiting the least parameter count along with excellent real-time operational speed.</p>","PeriodicalId":14027,"journal":{"name":"International Journal of Imaging Systems and Technology","volume":"34 5","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ima.23183","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142316939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Automatic Measurement Method of the Tibial Deformity Angle on X-Ray Films Based on Deep Learning Keypoint Detection Network 基于深度学习关键点检测网络的 X 光片胫骨畸形角度自动测量方法
IF 3 4区 计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-09-24 DOI: 10.1002/ima.23190
Ning Zhao, Cheng Chang, Yuanyuan Liu, Xiao Li, Zicheng Song, Yue Guo, Jianwen Chen, Hao Sun

In the clinical application of the parallel external fixator, medical practitioners are required to quantify deformity parameters to develop corrective strategies. However, manual measurement of deformity angles is a complex and time-consuming process that is susceptible to subjective factors, resulting in nonreproducible results. Accordingly, this study proposes an automatic measurement method based on deep learning, comprising three stages: tibial segment localization, tibial contour point detection, and deformity angle calculation. First, the Faster R-CNN object detection model, combined with ResNet50 and FPN as the backbone, was employed to achieve accurate localization of tibial segments under both occluded and nonoccluded conditions. Subsequently, a relative position constraint loss function was added, and ResNet101 was used as the backbone, resulting in an improved RTMPose keypoint detection model that achieved precise detection of tibial contour points. Ultimately, the bone axes of each tibial segment were determined based on the coordinates of the contour points, and the deformity angles were calculated. The enhanced keypoint detection model, Con_RTMPose, elevated the Percentage of Correct Keypoints (PCK) from 63.94% of the initial model to 87.17%, markedly augmenting keypoint localization precision. Compared to manual measurements conducted by medical professionals, the proposed methodology demonstrates an average error of 0.52°, a maximum error of 1.15°, and a standard deviation of 0.07, thereby satisfying the requisite accuracy standards for orthopedic assessments. The measurement time is approximately 12 s, whereas manual measurement requires about 15 min, greatly reducing the time required. Additionally, the stability of the models was verified through K-fold cross-validation experiments. The proposed method meets the accuracy requirements for orthopedic applications, provides objective and reproducible results, significantly reduces the workload of medical professionals, and greatly improves efficiency.

在平行外固定器的临床应用中,医生需要量化畸形参数以制定矫正策略。然而,人工测量畸形角度是一个复杂且耗时的过程,容易受到主观因素的影响,导致测量结果不可重复。因此,本研究提出了一种基于深度学习的自动测量方法,包括三个阶段:胫骨段定位、胫骨轮廓点检测和畸形角度计算。首先,采用以 ResNet50 和 FPN 为骨干的 Faster R-CNN 物体检测模型,实现闭塞和非闭塞条件下胫骨节段的精确定位。随后,添加了相对位置约束损失函数,并使用 ResNet101 作为骨干,形成了改进的 RTMPose 关键点检测模型,实现了对胫骨轮廓点的精确检测。最终,根据轮廓点坐标确定了每个胫骨节段的骨轴,并计算出了畸形角。增强型关键点检测模型 Con_RTMPose 将关键点正确率 (PCK) 从初始模型的 63.94% 提高到 87.17%,显著提高了关键点定位精度。与医疗专业人员进行的人工测量相比,该方法的平均误差为 0.52°,最大误差为 1.15°,标准偏差为 0.07,从而满足了骨科评估所需的精确度标准。测量时间约为 12 秒,而人工测量约需 15 分钟,大大缩短了所需时间。此外,模型的稳定性也通过 K 倍交叉验证实验得到了验证。所提出的方法符合骨科应用的精度要求,能提供客观、可重复的结果,大大减轻了医务人员的工作量,并极大地提高了工作效率。
{"title":"An Automatic Measurement Method of the Tibial Deformity Angle on X-Ray Films Based on Deep Learning Keypoint Detection Network","authors":"Ning Zhao,&nbsp;Cheng Chang,&nbsp;Yuanyuan Liu,&nbsp;Xiao Li,&nbsp;Zicheng Song,&nbsp;Yue Guo,&nbsp;Jianwen Chen,&nbsp;Hao Sun","doi":"10.1002/ima.23190","DOIUrl":"https://doi.org/10.1002/ima.23190","url":null,"abstract":"<div>\u0000 \u0000 <p>In the clinical application of the parallel external fixator, medical practitioners are required to quantify deformity parameters to develop corrective strategies. However, manual measurement of deformity angles is a complex and time-consuming process that is susceptible to subjective factors, resulting in nonreproducible results. Accordingly, this study proposes an automatic measurement method based on deep learning, comprising three stages: tibial segment localization, tibial contour point detection, and deformity angle calculation. First, the Faster R-CNN object detection model, combined with ResNet50 and FPN as the backbone, was employed to achieve accurate localization of tibial segments under both occluded and nonoccluded conditions. Subsequently, a relative position constraint loss function was added, and ResNet101 was used as the backbone, resulting in an improved RTMPose keypoint detection model that achieved precise detection of tibial contour points. Ultimately, the bone axes of each tibial segment were determined based on the coordinates of the contour points, and the deformity angles were calculated. The enhanced keypoint detection model, Con_RTMPose, elevated the Percentage of Correct Keypoints (PCK) from 63.94% of the initial model to 87.17%, markedly augmenting keypoint localization precision. Compared to manual measurements conducted by medical professionals, the proposed methodology demonstrates an average error of 0.52°, a maximum error of 1.15°, and a standard deviation of 0.07, thereby satisfying the requisite accuracy standards for orthopedic assessments. The measurement time is approximately 12 s, whereas manual measurement requires about 15 min, greatly reducing the time required. Additionally, the stability of the models was verified through <i>K</i>-fold cross-validation experiments. The proposed method meets the accuracy requirements for orthopedic applications, provides objective and reproducible results, significantly reduces the workload of medical professionals, and greatly improves efficiency.</p>\u0000 </div>","PeriodicalId":14027,"journal":{"name":"International Journal of Imaging Systems and Technology","volume":"34 5","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142316944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Simulation Design of a Triple Antenna Combination for PET-MRI Imaging Compatible With 3, 7, and 11.74 T MRI Scanner 与 3、7 和 11.74 T 磁共振成像扫描仪兼容的 PET-MRI 成像三天线组合模拟设计
IF 3 4区 计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-09-24 DOI: 10.1002/ima.23191
Daniel Hernandez, Taewoo Nam, Eunwoo Lee, Geun Bae Ko, Jae Sung Lee, Kyoung-Nam Kim

The use of electromagnetism and the design of antennas in the field of medical imaging have played important roles in clinical practice. Specifically, magnetic resonance imaging (MRI) utilizes transmission and reception antennas, or coils, that are tuned to specific frequencies depending on the strength of the main magnet. Clinical scanners operating at 3 Teslas (T) function at a frequency of 127 MHz, while research scanners at 7 T operate at 300 MHz. An 11.74 T scanner for human imaging, which is currently under development, will operate at a frequency of 500 MHz. MRI allows for the high-definition scanning of biological tissues, making it a valuable tool for enhancing images acquired with positron emission tomography (PET). PET is an imaging modality used to evaluate the metabolism of organs or cancers. With recent advancements in the development of portable PET systems that can be integrated into any MRI scanner, we propose the design based on electromagnetic simulations of a triple-tuned array of dipole antennas to operate at 127, 300, and 500 MHz. This array can be attached to the PET inset and used in 3, 7, or 11.74 T scanners.

电磁学的应用和医学成像领域的天线设计在临床实践中发挥了重要作用。具体来说,磁共振成像(MRI)利用发射和接收天线或线圈,根据主磁铁的强度调谐到特定频率。临床扫描仪的工作频率为 3 T,频率为 127 MHz,而研究扫描仪的工作频率为 7 T,频率为 300 MHz。目前正在开发的用于人体成像的 11.74 T 扫描仪的工作频率为 500 MHz。核磁共振成像可对生物组织进行高清扫描,是增强正电子发射断层扫描(PET)图像的重要工具。PET 是一种用于评估器官或癌症新陈代谢的成像模式。随着可集成到任何核磁共振扫描仪中的便携式 PET 系统的开发取得最新进展,我们提出了基于电磁模拟的三重调谐偶极子天线阵列设计方案,其工作频率分别为 127、300 和 500 MHz。该阵列可连接到 PET 嵌体上,在 3、7 或 11.74 T 扫描仪中使用。
{"title":"Simulation Design of a Triple Antenna Combination for PET-MRI Imaging Compatible With 3, 7, and 11.74 T MRI Scanner","authors":"Daniel Hernandez,&nbsp;Taewoo Nam,&nbsp;Eunwoo Lee,&nbsp;Geun Bae Ko,&nbsp;Jae Sung Lee,&nbsp;Kyoung-Nam Kim","doi":"10.1002/ima.23191","DOIUrl":"https://doi.org/10.1002/ima.23191","url":null,"abstract":"<p>The use of electromagnetism and the design of antennas in the field of medical imaging have played important roles in clinical practice. Specifically, magnetic resonance imaging (MRI) utilizes transmission and reception antennas, or coils, that are tuned to specific frequencies depending on the strength of the main magnet. Clinical scanners operating at 3 Teslas (T) function at a frequency of 127 MHz, while research scanners at 7 T operate at 300 MHz. An 11.74 T scanner for human imaging, which is currently under development, will operate at a frequency of 500 MHz. MRI allows for the high-definition scanning of biological tissues, making it a valuable tool for enhancing images acquired with positron emission tomography (PET). PET is an imaging modality used to evaluate the metabolism of organs or cancers. With recent advancements in the development of portable PET systems that can be integrated into any MRI scanner, we propose the design based on electromagnetic simulations of a triple-tuned array of dipole antennas to operate at 127, 300, and 500 MHz. This array can be attached to the PET inset and used in 3, 7, or 11.74 T scanners.</p>","PeriodicalId":14027,"journal":{"name":"International Journal of Imaging Systems and Technology","volume":"34 5","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ima.23191","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142316941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adapting Segment Anything Model for 3D Brain Tumor Segmentation With Missing Modalities 为缺失模式下的三维脑肿瘤分段调整分段任何模型
IF 3 4区 计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-09-24 DOI: 10.1002/ima.23177
Xiaoliang Lei, Xiaosheng Yu, Maocheng Bai, Jingsi Zhang, Chengdong Wu

The problem of missing or unavailable magnetic resonance imaging modalities challenges clinical diagnosis and medical image analysis technology. Although the development of deep learning and the proposal of large models have improved medical analytics, this problem still needs to be better resolved.The purpose of this study was to efficiently adapt the Segment Anything Model, a two-dimensional visual foundation model trained on natural images, to address the challenge of brain tumor segmentation with missing modalities. We designed a twin network structure that processes missing and intact magnetic resonance imaging (MRI) modalities separately using shared parameters. It involved comparing the features of two network branches to minimize differences between the feature maps derived from them. We added a multimodal adapter before the image encoder and a spatial–depth adapter before the mask decoder to fine-tune the Segment Anything Model for brain tumor segmentation. The proposed method was evaluated using datasets provided by the MICCAI BraTS2021 Challenge. In terms of accuracy and robustness, the proposed method is better than existing solutions. The proposed method can segment brain tumors well under the missing modality condition.

磁共振成像模式缺失或不可用的问题给临床诊断和医学图像分析技术带来了挑战。虽然深度学习的发展和大型模型的提出改善了医学分析,但这一问题仍有待更好地解决。本研究的目的是对 Segment Anything Model(一种在自然图像上训练的二维视觉基础模型)进行有效调整,以解决缺失模式下的脑肿瘤分割难题。我们设计了一种孪生网络结构,利用共享参数分别处理缺失和完整的磁共振成像(MRI)模式。这涉及到比较两个网络分支的特征,以最大限度地减少从它们得出的特征图之间的差异。我们在图像编码器之前添加了一个多模态适配器,在掩膜解码器之前添加了一个空间深度适配器,以微调用于脑肿瘤分割的 "任意分割模型"。我们使用 MICCAI BraTS2021 挑战赛提供的数据集对所提出的方法进行了评估。在准确性和鲁棒性方面,所提出的方法优于现有的解决方案。在缺失模态条件下,所提出的方法能很好地分割脑肿瘤。
{"title":"Adapting Segment Anything Model for 3D Brain Tumor Segmentation With Missing Modalities","authors":"Xiaoliang Lei,&nbsp;Xiaosheng Yu,&nbsp;Maocheng Bai,&nbsp;Jingsi Zhang,&nbsp;Chengdong Wu","doi":"10.1002/ima.23177","DOIUrl":"https://doi.org/10.1002/ima.23177","url":null,"abstract":"<div>\u0000 \u0000 <p>The problem of missing or unavailable magnetic resonance imaging modalities challenges clinical diagnosis and medical image analysis technology. Although the development of deep learning and the proposal of large models have improved medical analytics, this problem still needs to be better resolved.The purpose of this study was to efficiently adapt the Segment Anything Model, a two-dimensional visual foundation model trained on natural images, to address the challenge of brain tumor segmentation with missing modalities. We designed a twin network structure that processes missing and intact magnetic resonance imaging (MRI) modalities separately using shared parameters. It involved comparing the features of two network branches to minimize differences between the feature maps derived from them. We added a multimodal adapter before the image encoder and a spatial–depth adapter before the mask decoder to fine-tune the Segment Anything Model for brain tumor segmentation. The proposed method was evaluated using datasets provided by the MICCAI BraTS2021 Challenge. In terms of accuracy and robustness, the proposed method is better than existing solutions. The proposed method can segment brain tumors well under the missing modality condition.</p>\u0000 </div>","PeriodicalId":14027,"journal":{"name":"International Journal of Imaging Systems and Technology","volume":"34 5","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142316943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Dual Cascaded Deep Theoretic Learning Approach for the Segmentation of the Brain Tumors in MRI Scans 用于磁共振成像扫描中脑肿瘤分割的双级联深度理论学习方法
IF 3 4区 计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-09-23 DOI: 10.1002/ima.23186
Jinka Sreedhar, Suresh Dara, C. H. Srinivasulu, Butchi Raju Katari, Ahmed Alkhayyat, Ankit Vidyarthi, Mashael M. Alsulami

Accurate segmentation of brain tumors from magnetic resonance imaging (MRI) is crucial for diagnosis, treatment planning, and monitoring of patients with neurological disorders. This paper proposes an approach for brain tumor segmentation employing a cascaded architecture integrating L-Net and W-Net deep learning models. The proposed cascaded model leverages the strengths of U-Net as a baseline model to enhance the precision and robustness of the segmentation process. In the proposed framework, the L-Net excels in capturing the mask, while the W-Net focuses on fine-grained features and spatial information to discern complex tumor boundaries. The cascaded configuration allows for a seamless integration of these complementary models, enhancing the overall segmentation performance. To evaluate the proposed approach, extensive experiments were conducted on the datasets of BraTs and SMS Medical College comprising multi-modal MRI images. The experimental results demonstrate that the cascaded L-Net and W-Net model consistently outperforms individual models and other state-of-the-art segmentation methods. The performance metrics such as the Dice Similarity Coefficient value achieved indicate high segmentation accuracy, while Sensitivity and Specificity metrics showcase the model's ability to correctly identify tumor regions and exclude healthy tissues. Moreover, the low Hausdorff Distance values confirm the model's capability to accurately delineate tumor boundaries. In comparison with the existing methods, the proposed cascaded scheme leverages the strengths of each network, leading to superior performance compared to existing works of literature.

从磁共振成像(MRI)中准确分割脑肿瘤对于神经系统疾病患者的诊断、治疗计划和监测至关重要。本文提出了一种采用级联架构的脑肿瘤分割方法,该架构集成了 L-Net 和 W-Net 深度学习模型。所提出的级联模型充分利用了 U-Net 作为基线模型的优势,从而提高了分割过程的精度和鲁棒性。在提议的框架中,L-Net 擅长捕捉掩膜,而 W-Net 则侧重于细粒度特征和空间信息,以辨别复杂的肿瘤边界。级联配置允许无缝集成这些互补模型,从而提高整体分割性能。为了评估所提出的方法,我们在 BraTs 和 SMS 医学院的多模态磁共振成像数据集上进行了大量实验。实验结果表明,级联 L-Net 和 W-Net 模型的性能始终优于单个模型和其他最先进的分割方法。所获得的 Dice 相似性系数值等性能指标表明该模型具有很高的分割准确性,而灵敏度和特异性指标则表明该模型能够正确识别肿瘤区域并排除健康组织。此外,较低的 Hausdorff Distance 值也证实了该模型准确划分肿瘤边界的能力。与现有方法相比,所提出的级联方案充分利用了每个网络的优势,因此与现有文献相比性能更优。
{"title":"A Dual Cascaded Deep Theoretic Learning Approach for the Segmentation of the Brain Tumors in MRI Scans","authors":"Jinka Sreedhar,&nbsp;Suresh Dara,&nbsp;C. H. Srinivasulu,&nbsp;Butchi Raju Katari,&nbsp;Ahmed Alkhayyat,&nbsp;Ankit Vidyarthi,&nbsp;Mashael M. Alsulami","doi":"10.1002/ima.23186","DOIUrl":"https://doi.org/10.1002/ima.23186","url":null,"abstract":"<div>\u0000 \u0000 <p>Accurate segmentation of brain tumors from magnetic resonance imaging (MRI) is crucial for diagnosis, treatment planning, and monitoring of patients with neurological disorders. This paper proposes an approach for brain tumor segmentation employing a cascaded architecture integrating L-Net and W-Net deep learning models. The proposed cascaded model leverages the strengths of U-Net as a baseline model to enhance the precision and robustness of the segmentation process. In the proposed framework, the L-Net excels in capturing the mask, while the W-Net focuses on fine-grained features and spatial information to discern complex tumor boundaries. The cascaded configuration allows for a seamless integration of these complementary models, enhancing the overall segmentation performance. To evaluate the proposed approach, extensive experiments were conducted on the datasets of BraTs and SMS Medical College comprising multi-modal MRI images. The experimental results demonstrate that the cascaded L-Net and W-Net model consistently outperforms individual models and other state-of-the-art segmentation methods. The performance metrics such as the Dice Similarity Coefficient value achieved indicate high segmentation accuracy, while Sensitivity and Specificity metrics showcase the model's ability to correctly identify tumor regions and exclude healthy tissues. Moreover, the low Hausdorff Distance values confirm the model's capability to accurately delineate tumor boundaries. In comparison with the existing methods, the proposed cascaded scheme leverages the strengths of each network, leading to superior performance compared to existing works of literature.</p>\u0000 </div>","PeriodicalId":14027,"journal":{"name":"International Journal of Imaging Systems and Technology","volume":"34 5","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142313219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CasUNeXt: A Cascaded Transformer With Intra- and Inter-Scale Information for Medical Image Segmentation CasUNeXt:用于医学图像分割的具有尺度内和尺度间信息的级联变换器
IF 3 4区 计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-09-21 DOI: 10.1002/ima.23184
Junding Sun, Xiaopeng Zheng, Xiaosheng Wu, Chaosheng Tang, Shuihua Wang, Yudong Zhang

Due to the Transformer's ability to capture long-range dependencies through Self-Attention, it has shown immense potential in medical image segmentation. However, it lacks the capability to model local relationships between pixels. Therefore, many previous approaches embedded the Transformer into the CNN encoder. However, current methods often fall short in modeling the relationships between multi-scale features, specifically the spatial correspondence between features at different scales. This limitation can result in the ineffective capture of scale differences for each object and the loss of features for small targets. Furthermore, due to the high complexity of the Transformer, it is challenging to integrate local and global information within the same scale effectively. To address these limitations, we propose a novel backbone network called CasUNeXt, which features three appealing design elements: (1) We use the idea of cascade to redesign the way CNN and Transformer are combined to enhance modeling the unique interrelationships between multi-scale information. (2) We design a Cascaded Scale-wise Transformer Module capable of cross-scale interactions. It not only strengthens feature extraction within a single scale but also models interactions between different scales. (3) We overhaul the multi-head Channel Attention mechanism to enable it to model context information in feature maps from multiple perspectives within the channel dimension. These design features collectively enable CasUNeXt to better integrate local and global information and capture relationships between multi-scale features, thereby improving the performance of medical image segmentation. Through experimental comparisons on various benchmark datasets, our CasUNeXt method exhibits outstanding performance in medical image segmentation tasks, surpassing the current state-of-the-art methods.

由于 Transformer 能够通过自我关注捕捉长距离依赖关系,因此在医学图像分割方面显示出巨大的潜力。然而,它缺乏对像素间局部关系建模的能力。因此,以前的许多方法都将变换器嵌入到 CNN 编码器中。然而,目前的方法往往无法模拟多尺度特征之间的关系,特别是不同尺度特征之间的空间对应关系。这种局限性会导致无法有效捕捉每个物体的尺度差异,以及丢失小目标的特征。此外,由于变换器的高复杂性,在同一尺度内有效整合局部和全局信息也是一项挑战。为了解决这些局限性,我们提出了一种名为 CasUNeXt 的新型骨干网络,它具有三个吸引人的设计元素:(1) 我们利用级联的思想重新设计了 CNN 和 Transformer 的组合方式,以加强对多尺度信息之间独特相互关系的建模。(2) 我们设计了一个能够进行跨尺度交互的级联尺度变换器模块。它不仅能加强单一尺度内的特征提取,还能模拟不同尺度之间的交互。(3) 我们彻底改变了多头通道关注机制,使其能够在通道维度内从多个角度对特征图中的上下文信息进行建模。这些设计特点使 CasUNeXt 能够更好地整合局部和全局信息,捕捉多尺度特征之间的关系,从而提高医学图像分割的性能。通过在各种基准数据集上的实验比较,我们的 CasUNeXt 方法在医学图像分割任务中表现出卓越的性能,超越了目前最先进的方法。
{"title":"CasUNeXt: A Cascaded Transformer With Intra- and Inter-Scale Information for Medical Image Segmentation","authors":"Junding Sun,&nbsp;Xiaopeng Zheng,&nbsp;Xiaosheng Wu,&nbsp;Chaosheng Tang,&nbsp;Shuihua Wang,&nbsp;Yudong Zhang","doi":"10.1002/ima.23184","DOIUrl":"https://doi.org/10.1002/ima.23184","url":null,"abstract":"<p>Due to the Transformer's ability to capture long-range dependencies through Self-Attention, it has shown immense potential in medical image segmentation. However, it lacks the capability to model local relationships between pixels. Therefore, many previous approaches embedded the Transformer into the CNN encoder. However, current methods often fall short in modeling the relationships between multi-scale features, specifically the spatial correspondence between features at different scales. This limitation can result in the ineffective capture of scale differences for each object and the loss of features for small targets. Furthermore, due to the high complexity of the Transformer, it is challenging to integrate local and global information within the same scale effectively. To address these limitations, we propose a novel backbone network called CasUNeXt, which features three appealing design elements: (1) We use the idea of cascade to redesign the way CNN and Transformer are combined to enhance modeling the unique interrelationships between multi-scale information. (2) We design a Cascaded Scale-wise Transformer Module capable of cross-scale interactions. It not only strengthens feature extraction within a single scale but also models interactions between different scales. (3) We overhaul the multi-head Channel Attention mechanism to enable it to model context information in feature maps from multiple perspectives within the channel dimension. These design features collectively enable CasUNeXt to better integrate local and global information and capture relationships between multi-scale features, thereby improving the performance of medical image segmentation. Through experimental comparisons on various benchmark datasets, our CasUNeXt method exhibits outstanding performance in medical image segmentation tasks, surpassing the current state-of-the-art methods.</p>","PeriodicalId":14027,"journal":{"name":"International Journal of Imaging Systems and Technology","volume":"34 5","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ima.23184","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142276596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
International Journal of Imaging Systems and Technology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1