首页 > 最新文献

Medical image analysis最新文献

英文 中文
Complex wavelet-based Transformer for neurodevelopmental disorder diagnosis via direct modeling of real and imaginary components 基于实虚分量直接建模的复杂小波变换神经发育障碍诊断
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-14 DOI: 10.1016/j.media.2025.103914
Ah-Yeong Jeong , Da-Woon Heo , Heung-Il Suk
Resting-state functional magnetic resonance imaging (rs-fMRI) measures intrinsic neural activity, and analyzing its frequency-domain characteristics provides insights into brain dynamics. Owing to these properties, rs-fMRI is widely used to investigate brain disorders such as autism spectrum disorder (ASD) and attention deficit hyperactivity disorder (ADHD). Conventional frequency-domain analyses typically rely on the Fourier transform, which lacks flexibility in capturing non-stationary neural signals due to its fixed resolution. Furthermore, these methods primarily utilize only real-valued features, such as the magnitude or phase, derived from complex-valued spectral representations. Consequently, direct modeling of the real and imaginary components, particularly within fMRI analyses, remains largely unexplored, overlooking the distinct and complementary spectral information encoded in these components. To address these limitations, we propose a novel Transformer-based framework that explicitly models the real and imaginary components of continuous wavelet transform (CWT) coefficients from rs-fMRI signals. Our architecture integrates spectral, temporal, and spatial attention modules, employing self- and cross-attention mechanisms to jointly capture intra- and inter-component relationships. Applied to the Autism Brain Imaging Data Exchange (ABIDE)-I and ADHD-200 datasets, our approach achieved state-of-the-art classification performance compared to existing baselines. Comprehensive ablation studies demonstrated the advantages of directly utilizing real and imaginary components over conventional frequency-domain features and validate each module’s contribution. Moreover, attention-based analyses revealed frequency- and region-specific patterns consistent with known neurobiological alterations in ASD and ADHD. These findings highlight that preserving and jointly leveraging the real and imaginary components of CWT-based representations not only enhances diagnostic performance but also provides interpretable insights into neurodevelopmental disorders.
静息状态功能磁共振成像(rs-fMRI)测量内在的神经活动,分析其频域特征提供了对大脑动力学的见解。由于这些特性,rs-fMRI被广泛用于研究大脑疾病,如自闭症谱系障碍(ASD)和注意缺陷多动障碍(ADHD)。传统的频域分析通常依赖于傅里叶变换,由于其固定的分辨率,在捕获非平稳神经信号方面缺乏灵活性。此外,这些方法主要只利用实值特征,如从复值光谱表示中得到的幅度或相位。因此,真实和虚构成分的直接建模,特别是在功能磁共振成像分析中,在很大程度上仍未被探索,忽略了这些成分中编码的独特和互补的光谱信息。为了解决这些限制,我们提出了一个新的基于变压器的框架,该框架明确地从rs-fMRI信号中模拟连续小波变换(CWT)系数的实分量和虚分量。我们的架构集成了光谱、时间和空间注意模块,采用自注意和交叉注意机制来共同捕获组件内部和组件间的关系。应用于自闭症脑成像数据交换(ABIDE)- 1和ADHD-200数据集,与现有基线相比,我们的方法取得了最先进的分类性能。综合烧蚀研究表明,与传统的频域特征相比,直接利用实分量和虚分量具有优势,并验证了每个模块的贡献。此外,基于注意力的分析揭示了频率和区域特异性模式,与已知的ASD和ADHD的神经生物学改变相一致。这些发现强调,保留和共同利用基于cwt的表征的真实和虚构成分不仅提高了诊断性能,而且为神经发育障碍提供了可解释的见解。
{"title":"Complex wavelet-based Transformer for neurodevelopmental disorder diagnosis via direct modeling of real and imaginary components","authors":"Ah-Yeong Jeong ,&nbsp;Da-Woon Heo ,&nbsp;Heung-Il Suk","doi":"10.1016/j.media.2025.103914","DOIUrl":"10.1016/j.media.2025.103914","url":null,"abstract":"<div><div>Resting-state functional magnetic resonance imaging (rs-fMRI) measures intrinsic neural activity, and analyzing its frequency-domain characteristics provides insights into brain dynamics. Owing to these properties, rs-fMRI is widely used to investigate brain disorders such as autism spectrum disorder (ASD) and attention deficit hyperactivity disorder (ADHD). Conventional frequency-domain analyses typically rely on the Fourier transform, which lacks flexibility in capturing non-stationary neural signals due to its fixed resolution. Furthermore, these methods primarily utilize only real-valued features, such as the magnitude or phase, derived from complex-valued spectral representations. Consequently, direct modeling of the real and imaginary components, particularly within fMRI analyses, remains largely unexplored, overlooking the distinct and complementary spectral information encoded in these components. To address these limitations, we propose a novel Transformer-based framework that explicitly models the real and imaginary components of continuous wavelet transform (CWT) coefficients from rs-fMRI signals. Our architecture integrates spectral, temporal, and spatial attention modules, employing self- and cross-attention mechanisms to jointly capture intra- and inter-component relationships. Applied to the Autism Brain Imaging Data Exchange (ABIDE)-I and ADHD-200 datasets, our approach achieved state-of-the-art classification performance compared to existing baselines. Comprehensive ablation studies demonstrated the advantages of directly utilizing real and imaginary components over conventional frequency-domain features and validate each module’s contribution. Moreover, attention-based analyses revealed frequency- and region-specific patterns consistent with known neurobiological alterations in ASD and ADHD. These findings highlight that preserving and jointly leveraging the real and imaginary components of CWT-based representations not only enhances diagnostic performance but also provides interpretable insights into neurodevelopmental disorders.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103914"},"PeriodicalIF":11.8,"publicationDate":"2025-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145753384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CLIP-Guided Generative network for pathology nuclei image augmentation 用于病理核图像增强的clip引导生成网络
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-14 DOI: 10.1016/j.media.2025.103908
Yanan Zhang , Qingyang Liu , Qian Chen , Xiangzhi Bai
Nuclei segmentation and classification play a crucial role in the quantitative analysis of computational pathology (CPath). However, the challenge of creating a large volume of labeled pathology nuclei images due to annotation costs has significantly limited the performance of deep learning-based nuclei segmentation methods. Generative data augmentation offers a promising solution by substantially expanding the available training data without additional annotations. In medical image analysis, Generative Adversarial Networks (GANs) were effective for data augmentation, enhancing model performance by generating realistic synthetic data. However, these approaches lack scalability for multi-class data, as nuclei masks cannot provide sufficient information for diverse image generation. Recently, visual-language foundation models, pretrained on large-scale image-caption pairs, have demonstrated robust performance in pathological diagnostic tasks. In this study, we propose a CLIP-guided generative data augmentation method for nuclei segmentation and classification, leveraging the pretrained pathological CLIP text and image encoders in both the generator and discriminator. Specifically, we first create text descriptions by processing paired histopathology images and nuclei masks, which include information such as organ tissue type, cell count, and nuclei types. These paired text descriptions and nuclei masks are then fed into our multi-modal conditional image generator to guide the synthesis of realistic histopathology images. To ensure the quality of synthesized images, we utilize a high-resolution image discriminator and a CLIP image encoder-based discriminator, focusing on both local and global features of histopathology images. The synthetic histopathology images, paired with corresponding nuclei masks, are integrated into the real dataset to train the nuclei segmentation and classification model. Our experiments, conducted on diverse publicly available pathology nuclei datasets, including both qualitative and quantitative analysis, demonstrate the effectiveness of our proposed method. The experimental results of the nuclei segmentation and classification task underscore the advantages of our data augmentation approach. The code is available at https://github.com/zhangyn1415/CGPN-GAN.
细胞核的分割和分类在计算病理(CPath)的定量分析中起着至关重要的作用。然而,由于标注成本的原因,创建大量标记病理核图像的挑战极大地限制了基于深度学习的核分割方法的性能。生成数据增强提供了一个很有前途的解决方案,它可以在不添加额外注释的情况下大量扩展可用的训练数据。在医学图像分析中,生成对抗网络(GANs)可以有效地增强数据,通过生成真实的合成数据来提高模型性能。然而,这些方法缺乏多类数据的可扩展性,因为核掩模不能为不同的图像生成提供足够的信息。最近,在大规模图像标题对上进行预训练的视觉语言基础模型在病理诊断任务中表现出了强大的性能。在这项研究中,我们提出了一种CLIP引导的生成数据增强方法,用于细胞核分割和分类,在生成器和鉴别器中利用预训练的病理CLIP文本和图像编码器。具体来说,我们首先通过处理成对的组织病理学图像和细胞核掩膜来创建文本描述,其中包括器官组织类型、细胞计数和细胞核类型等信息。然后将这些配对的文本描述和核掩模输入到我们的多模态条件图像生成器中,以指导真实组织病理学图像的合成。为了保证合成图像的质量,我们利用高分辨率图像鉴别器和基于CLIP图像编码器的鉴别器,同时关注组织病理学图像的局部和全局特征。将合成的组织病理学图像与相应的细胞核掩模配对,整合到真实数据集中,训练细胞核分割和分类模型。我们在不同的公开可用的病理核数据集上进行的实验,包括定性和定量分析,证明了我们提出的方法的有效性。核分割和分类任务的实验结果强调了我们的数据增强方法的优势。代码可在https://github.com/zhangyn1415/CGPN-GAN上获得。
{"title":"CLIP-Guided Generative network for pathology nuclei image augmentation","authors":"Yanan Zhang ,&nbsp;Qingyang Liu ,&nbsp;Qian Chen ,&nbsp;Xiangzhi Bai","doi":"10.1016/j.media.2025.103908","DOIUrl":"10.1016/j.media.2025.103908","url":null,"abstract":"<div><div>Nuclei segmentation and classification play a crucial role in the quantitative analysis of computational pathology (CPath). However, the challenge of creating a large volume of labeled pathology nuclei images due to annotation costs has significantly limited the performance of deep learning-based nuclei segmentation methods. Generative data augmentation offers a promising solution by substantially expanding the available training data without additional annotations. In medical image analysis, Generative Adversarial Networks (GANs) were effective for data augmentation, enhancing model performance by generating realistic synthetic data. However, these approaches lack scalability for multi-class data, as nuclei masks cannot provide sufficient information for diverse image generation. Recently, visual-language foundation models, pretrained on large-scale image-caption pairs, have demonstrated robust performance in pathological diagnostic tasks. In this study, we propose a CLIP-guided generative data augmentation method for nuclei segmentation and classification, leveraging the pretrained pathological CLIP text and image encoders in both the generator and discriminator. Specifically, we first create text descriptions by processing paired histopathology images and nuclei masks, which include information such as organ tissue type, cell count, and nuclei types. These paired text descriptions and nuclei masks are then fed into our multi-modal conditional image generator to guide the synthesis of realistic histopathology images. To ensure the quality of synthesized images, we utilize a high-resolution image discriminator and a CLIP image encoder-based discriminator, focusing on both local and global features of histopathology images. The synthetic histopathology images, paired with corresponding nuclei masks, are integrated into the real dataset to train the nuclei segmentation and classification model. Our experiments, conducted on diverse publicly available pathology nuclei datasets, including both qualitative and quantitative analysis, demonstrate the effectiveness of our proposed method. The experimental results of the nuclei segmentation and classification task underscore the advantages of our data augmentation approach. The code is available at <span><span>https://github.com/zhangyn1415/CGPN-GAN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103908"},"PeriodicalIF":11.8,"publicationDate":"2025-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145753375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BIASNet: A bidirectional feature alignment and semantics-guided network for weakly-supervised medical image registration BIASNet:用于弱监督医学图像配准的双向特征对齐和语义引导网络
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-12 DOI: 10.1016/j.media.2025.103913
Housheng Xie, Xiaoru Gao, Guoyan Zheng
Medical image registration, which establishes spatial correspondences between different medical images, serves as a fundamental process in numerous clinical applications and diagnostic workflows. Despite significant advancement in unsupervised deep learning-based registration methods, these approaches consistently yield suboptimal results compared to their weakly-supervised counterparts. Recent advancements in universal segmentation models have made it easier to obtain anatomical labels from medical images. However, existing registration methods have not fully leveraged the rich anatomical and structural prior information provided by segmentation labels. To address this limitation, we propose a BIdirectional feature Alignment and Semantics-guided Network, referred to as BIASNet, for weakly-supervised image registration. Specifically, starting from multi-scale features extracted from the pre-trained VoCo, fine-tuned using Low-Rank Adaptation (LoRA), we propose a dual-attribute learning scheme, incorporating a novel BIdirectional Alignment and Fusion (BIAF) module for extracting both semantics-wise and intensity-wise features. These two types of features are subsequently fed into a semantics-guided progressive registration framework for accurate deformation field estimation. We further propose an anatomical region deformation consistency learning to regularize the target anatomical regions deformation. Comprehensive experiments conducted on three typical yet challenging datasets demonstrate that our method achieves consistently better results than other state-of-the-art deformable registration approaches. The source code is publicly available at https://github.com/xiehousheng/BIASNet.
医学图像配准是建立不同医学图像之间的空间对应关系,是许多临床应用和诊断工作流程中的基本过程。尽管基于无监督深度学习的注册方法取得了重大进展,但与弱监督的方法相比,这些方法始终产生次优结果。通用分割模型的最新进展使得从医学图像中获得解剖标签变得更加容易。然而,现有的配准方法并没有充分利用分割标签提供的丰富的解剖和结构先验信息。为了解决这一限制,我们提出了一个双向特征对齐和语义引导网络,称为BIASNet,用于弱监督图像配准。具体而言,我们提出了一种双属性学习方案,从预训练的VoCo中提取多尺度特征,使用低秩自适应(Low-Rank Adaptation, LoRA)进行微调,并结合一种新的双向对齐和融合(BIAF)模块来提取语义和强度特征。这两种类型的特征随后被输入到一个语义引导的渐进配准框架中,用于精确的变形场估计。我们进一步提出了一种解剖区域变形一致性学习来规范目标解剖区域的变形。在三个典型但具有挑战性的数据集上进行的综合实验表明,我们的方法始终比其他最先进的可变形配准方法取得更好的结果。源代码可在https://github.com/xiehousheng/BIASNet上公开获得。
{"title":"BIASNet: A bidirectional feature alignment and semantics-guided network for weakly-supervised medical image registration","authors":"Housheng Xie,&nbsp;Xiaoru Gao,&nbsp;Guoyan Zheng","doi":"10.1016/j.media.2025.103913","DOIUrl":"10.1016/j.media.2025.103913","url":null,"abstract":"<div><div>Medical image registration, which establishes spatial correspondences between different medical images, serves as a fundamental process in numerous clinical applications and diagnostic workflows. Despite significant advancement in unsupervised deep learning-based registration methods, these approaches consistently yield suboptimal results compared to their weakly-supervised counterparts. Recent advancements in universal segmentation models have made it easier to obtain anatomical labels from medical images. However, existing registration methods have not fully leveraged the rich anatomical and structural prior information provided by segmentation labels. To address this limitation, we propose a <strong>BI</strong>directional feature <strong>A</strong>lignment and <strong>S</strong>emantics-guided Network, referred to as BIASNet, for weakly-supervised image registration. Specifically, starting from multi-scale features extracted from the pre-trained VoCo, fine-tuned using Low-Rank Adaptation (LoRA), we propose a dual-attribute learning scheme, incorporating a novel BIdirectional Alignment and Fusion (BIAF) module for extracting both semantics-wise and intensity-wise features. These two types of features are subsequently fed into a semantics-guided progressive registration framework for accurate deformation field estimation. We further propose an anatomical region deformation consistency learning to regularize the target anatomical regions deformation. Comprehensive experiments conducted on three typical yet challenging datasets demonstrate that our method achieves consistently better results than other state-of-the-art deformable registration approaches. The source code is publicly available at <span><span>https://github.com/xiehousheng/BIASNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103913"},"PeriodicalIF":11.8,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145730671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reason like a radiologist: Chain-of-thought and reinforcement learning for verifiable report generation 像放射科医生一样的原因:可验证报告生成的思维链和强化学习
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-11 DOI: 10.1016/j.media.2025.103910
Peiyuan Jing , Kinhei Lee , Zhenxuan Zhang , Huichi Zhou , Zhengqing Yuan , Zhifan Gao , Lei Zhu , Giorgos Papanastasiou , Yingying Fang , Guang Yang
Radiology report generation is critical for efficiency, but current models often lack the structured reasoning of experts and the ability to explicitly ground findings in anatomical evidence, which limits clinical trust and explainability. This paper introduces BoxMed-RL, a unified training framework to generate spatially verifiable and explainable chest X-ray reports. BoxMed-RL advances chest X-ray report generation through two integrated phases: (1) Pretraining Phase. BoxMed-RL learns radiologist-like reasoning through medical concept learning and enforces spatial grounding with reinforcement learning. (2) Downstream Adapter Phase. Pretrained weights are frozen while a lightweight adapter ensures fluency and clinical credibility. Experiments on two widely used public benchmarks (MIMIC-CXR and IU X-Ray) demonstrate that BoxMed-RL achieves an average 7 % improvement in both METEOR and ROUGE-L metrics compared to state-of-the-art methods. An average 5 % improvement in large language model-based metrics further underscores BoxMed-RL’s robustness in generating high-quality reports. Related code and training templates are publicly available at https://github.com/ayanglab/BoxMed-RL.
放射学报告生成对效率至关重要,但目前的模型往往缺乏专家的结构化推理和明确基于解剖证据的发现的能力,这限制了临床信任和可解释性。本文介绍了BoxMed-RL,这是一个统一的训练框架,用于生成空间可验证和可解释的胸部x线报告。BoxMed-RL通过两个综合阶段推进胸部x线报告生成:(1)预训练阶段。BoxMed-RL通过医学概念学习来学习放射科医生般的推理,并通过强化学习来强化空间基础。(2)下游适配器阶段。预训练的重量被冻结,而轻量级适配器确保流畅性和临床可信度。在两个广泛使用的公共基准测试(MIMIC-CXR和IU X-Ray)上进行的实验表明,与最先进的方法相比,BoxMed-RL在METEOR和ROUGE-L指标上平均提高了7%。在基于大型语言模型的度量中平均5%的改进进一步强调了BoxMed-RL在生成高质量报告方面的健壮性。相关代码和培训模板可在https://github.com/ayanglab/BoxMed-RL上公开获取。
{"title":"Reason like a radiologist: Chain-of-thought and reinforcement learning for verifiable report generation","authors":"Peiyuan Jing ,&nbsp;Kinhei Lee ,&nbsp;Zhenxuan Zhang ,&nbsp;Huichi Zhou ,&nbsp;Zhengqing Yuan ,&nbsp;Zhifan Gao ,&nbsp;Lei Zhu ,&nbsp;Giorgos Papanastasiou ,&nbsp;Yingying Fang ,&nbsp;Guang Yang","doi":"10.1016/j.media.2025.103910","DOIUrl":"10.1016/j.media.2025.103910","url":null,"abstract":"<div><div>Radiology report generation is critical for efficiency, but current models often lack the structured reasoning of experts and the ability to explicitly ground findings in anatomical evidence, which limits clinical trust and explainability. This paper introduces BoxMed-RL, a unified training framework to generate spatially verifiable and explainable chest X-ray reports. BoxMed-RL advances chest X-ray report generation through two integrated phases: (1) Pretraining Phase. BoxMed-RL learns radiologist-like reasoning through medical concept learning and enforces spatial grounding with reinforcement learning. (2) Downstream Adapter Phase. Pretrained weights are frozen while a lightweight adapter ensures fluency and clinical credibility. Experiments on two widely used public benchmarks (MIMIC-CXR and IU X-Ray) demonstrate that BoxMed-RL achieves an average 7 % improvement in both METEOR and ROUGE-L metrics compared to state-of-the-art methods. An average 5 % improvement in large language model-based metrics further underscores BoxMed-RL’s robustness in generating high-quality reports. Related code and training templates are publicly available at <span><span>https://github.com/ayanglab/BoxMed-RL</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103910"},"PeriodicalIF":11.8,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145730672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AtlasMorph: Learning conditional deformable templates for brain MRI AtlasMorph:学习脑MRI的条件可变形模板
IF 10.9 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-11 DOI: 10.1016/j.media.2025.103893
Marianne Rakic, Andrew Hoopes, S. Mazdak, Mert R. Sabuncu, John V. Guttag, Adrian V. Dalca
{"title":"AtlasMorph: Learning conditional deformable templates for brain MRI","authors":"Marianne Rakic, Andrew Hoopes, S. Mazdak, Mert R. Sabuncu, John V. Guttag, Adrian V. Dalca","doi":"10.1016/j.media.2025.103893","DOIUrl":"https://doi.org/10.1016/j.media.2025.103893","url":null,"abstract":"","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"14 1","pages":""},"PeriodicalIF":10.9,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145731917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CIA-net: Cross-modality interaction and aggregation network for ovarian tumor segmentation from multi-modal MRI CIA-Net:用于多模态MRI卵巢肿瘤分割的跨模态交互和聚合网络
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-11 DOI: 10.1016/j.media.2025.103907
Yifan Gao , Yong’ai Li , Xin Gao
Magnetic resonance imaging (MRI) is an essential examination for ovarian cancer, in which ovarian tumor segmentation is crucial for personalized diagnosis and treatment planning. However, ovarian tumors often present with mixed cystic and solid regions in imaging, posing additional difficulties for automatic segmentation. In clinical practice, radiologists use T2-weighted imaging as the main modality to delineate tumor boundaries. In comparison, multi-modal MRI provides complementary information across modalities that can improve tumor segmentation. Therefore, it is important to fuse salient features from other modalities to the main modality. In this paper, we propose a cross-modality interaction and aggregation network (CIA-Net), a hybrid convolutional and Transformer architecture, for automatic ovarian tumor segmentation from multi-modal MRI. CIA-Net divides multi-modal MRI into one main (T2) and three minor modalities (T1, ADC, DWI), each with independent encoders. The novel cross-modality collaboration block selectively aggregates complementary features from minor modalities into the main modality through a progressive context injection module. Additionally, we introduce the progressive neighborhood integrated module to filter intra- and inter-modality noise and redundancies by refining adjacent slices of each modality. We evaluate our proposed method on a diverse, multi-center ovarian tumor dataset comprising 739 patients, and further validate its generalization and robustness on two public benchmarks for brain and cardiac segmentation. Comparative experiments with other cutting-edge techniques demonstrate the effectiveness of CIA-Net, highlighting its potential to be applied in clinical scenarios.
磁共振成像(MRI)是卵巢癌的一项重要检查,其中卵巢肿瘤分割对个性化诊断和治疗计划至关重要。然而,卵巢肿瘤在影像学上经常表现为囊性和实性混合区,这给自动分割带来了额外的困难。在临床实践中,放射科医生使用t2加权成像作为划定肿瘤边界的主要方式。相比之下,多模态MRI提供了跨模态的互补信息,可以改善肿瘤分割。因此,将其他模态的显著特征融合到主模态中是很重要的。在本文中,我们提出了一种跨模态交互和聚合网络(CIA-Net),一种混合卷积和Transformer架构,用于从多模态MRI中自动分割卵巢肿瘤。CIA-Net将多模态MRI分为一个主模态(T2)和三个次要模态(T1、ADC、DWI),每个模态都有独立的编码器。新的跨模态协作块通过渐进式上下文注入模块选择性地将次要模态的互补特征聚合到主模态中。此外,我们引入渐进式邻域集成模块,通过细化每个模态的相邻切片来过滤模态内和模态间的噪声和冗余。我们在包含739例患者的多样化、多中心卵巢肿瘤数据集上评估了我们提出的方法,并进一步验证了其在脑和心脏分割两个公共基准上的泛化和稳健性。与其他尖端技术的对比实验证明了CIA-Net的有效性,突出了其在临床场景中的应用潜力。
{"title":"CIA-net: Cross-modality interaction and aggregation network for ovarian tumor segmentation from multi-modal MRI","authors":"Yifan Gao ,&nbsp;Yong’ai Li ,&nbsp;Xin Gao","doi":"10.1016/j.media.2025.103907","DOIUrl":"10.1016/j.media.2025.103907","url":null,"abstract":"<div><div>Magnetic resonance imaging (MRI) is an essential examination for ovarian cancer, in which ovarian tumor segmentation is crucial for personalized diagnosis and treatment planning. However, ovarian tumors often present with mixed cystic and solid regions in imaging, posing additional difficulties for automatic segmentation. In clinical practice, radiologists use T2-weighted imaging as the main modality to delineate tumor boundaries. In comparison, multi-modal MRI provides complementary information across modalities that can improve tumor segmentation. Therefore, it is important to fuse salient features from other modalities to the main modality. In this paper, we propose a cross-modality interaction and aggregation network (CIA-Net), a hybrid convolutional and Transformer architecture, for automatic ovarian tumor segmentation from multi-modal MRI. CIA-Net divides multi-modal MRI into one main (T2) and three minor modalities (T1, ADC, DWI), each with independent encoders. The novel cross-modality collaboration block selectively aggregates complementary features from minor modalities into the main modality through a progressive context injection module. Additionally, we introduce the progressive neighborhood integrated module to filter intra- and inter-modality noise and redundancies by refining adjacent slices of each modality. We evaluate our proposed method on a diverse, multi-center ovarian tumor dataset comprising 739 patients, and further validate its generalization and robustness on two public benchmarks for brain and cardiac segmentation. Comparative experiments with other cutting-edge techniques demonstrate the effectiveness of CIA-Net, highlighting its potential to be applied in clinical scenarios.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103907"},"PeriodicalIF":11.8,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145730675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MIRAGE: Medical image-text pre-training for robustness against noisy environments 海市蜃楼:医学图像-文本对噪声环境鲁棒性的预训练
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-10 DOI: 10.1016/j.media.2025.103912
Pujin Cheng , Yijin Huang , Li Lin , Junyan Lyu , Kenneth Kin-Yip Wong , Xiaoying Tang
Contrastive vision-language pre-training models have achieved significant success on large-scale general multi-modality datasets. However, in the medical domain, the high costs of data collection and expert annotation are likely to result in small-sized and noisy datasets, which can severely limit model performance due to overfitting unreliable data and misrepresenting patterns. To address this challenge, we present MIRAGE, a novel framework designed to handle mismatched false positives and semantically related false negatives during medical image-text pre-training. Cross-entropy-based optimization proves inadequate for noisy contrastive settings, as it tends to fail in distinguishing noisy samples and ends up fitting them, leading to suboptimal representations. To overcome this limitation, we introduce an optimal transport-based contrastive loss that effectively identifies noisy samples leveraging the nearest cross-modality neighbor prior, thereby reducing noisy samples’ adverse impact. Additionally, we propose an adaptive gradient balancing strategy that mitigates the influence of gradients from noisy samples. Extensive experiments demonstrate that MIRAGE achieves superior performance across six tasks and 14 datasets, largely outperforming representative state-of-the-art methods. Furthermore, comprehensive analyses on synthetic noisy data are performed, clearly demonstrating the contribution of each component in MIRAGE.
对比视觉语言预训练模型在大规模通用多模态数据集上取得了显著的成功。然而,在医疗领域,数据收集和专家注释的高成本可能导致数据集规模小且有噪声,这可能会由于过度拟合不可靠的数据和错误的模式而严重限制模型的性能。为了解决这一挑战,我们提出了MIRAGE,这是一个新的框架,旨在处理医学图像-文本预训练过程中不匹配的假阳性和语义相关的假阴性。基于交叉熵的优化被证明不适用于嘈杂的对比设置,因为它往往无法区分嘈杂的样本并最终拟合它们,导致次优表示。为了克服这一限制,我们引入了一种基于传输的最优对比损失,该损失利用最近的交叉模态邻居先验有效地识别噪声样本,从而减少噪声样本的不利影响。此外,我们提出了一种自适应梯度平衡策略,以减轻来自噪声样本的梯度的影响。大量实验表明,MIRAGE在6个任务和14个数据集上实现了卓越的性能,在很大程度上优于代表性的最先进的方法。此外,对合成噪声数据进行了综合分析,清楚地展示了MIRAGE中每个组件的贡献。
{"title":"MIRAGE: Medical image-text pre-training for robustness against noisy environments","authors":"Pujin Cheng ,&nbsp;Yijin Huang ,&nbsp;Li Lin ,&nbsp;Junyan Lyu ,&nbsp;Kenneth Kin-Yip Wong ,&nbsp;Xiaoying Tang","doi":"10.1016/j.media.2025.103912","DOIUrl":"10.1016/j.media.2025.103912","url":null,"abstract":"<div><div>Contrastive vision-language pre-training models have achieved significant success on large-scale general multi-modality datasets. However, in the medical domain, the high costs of data collection and expert annotation are likely to result in small-sized and noisy datasets, which can severely limit model performance due to overfitting unreliable data and misrepresenting patterns. To address this challenge, we present MIRAGE, a novel framework designed to handle mismatched false positives and semantically related false negatives during medical image-text pre-training. Cross-entropy-based optimization proves inadequate for noisy contrastive settings, as it tends to fail in distinguishing noisy samples and ends up fitting them, leading to suboptimal representations. To overcome this limitation, we introduce an optimal transport-based contrastive loss that effectively identifies noisy samples leveraging the nearest cross-modality neighbor prior, thereby reducing noisy samples’ adverse impact. Additionally, we propose an adaptive gradient balancing strategy that mitigates the influence of gradients from noisy samples. Extensive experiments demonstrate that MIRAGE achieves superior performance across six tasks and 14 datasets, largely outperforming representative state-of-the-art methods. Furthermore, comprehensive analyses on synthetic noisy data are performed, clearly demonstrating the contribution of each component in MIRAGE.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103912"},"PeriodicalIF":11.8,"publicationDate":"2025-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145731918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Segmentation of the right ventricular myocardial infarction in multi-centre cardiac magnetic resonance images 多中心心脏磁共振图像右室心肌梗死的分割
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-09 DOI: 10.1016/j.media.2025.103911
Chao Xu , Dongaolei An , Chaolu Feng , Zijian Bian , Lian-Ming Wu
Right ventricular myocardial infarction (RVMI) is associated with higher in-hospital morbidity and mortality. Cardiac magnetic resonance (CMR) imaging provides crucial pathological information for diagnosis and/or treatment of RVMI. Segmentation of RVMI in CMR images is significant but challenging. This is because, to the best of our knowledge, there is no publicly available dataset in this field. Furthermore, the severe class imbalance problem caused by mostly less than 0.2 % proportion and the extreme intensity overlap between RVMI and the background bring challenges to the design of segmentation model. Therefore, we release a benchmark CMR dataset, consist of short-axis MR images of 213 subjects from 3 centres acquired by Philips, GE, and Siemens equipments. A multi-stage sequential deep learning model RVMISegNet is proposed to segment RVMI and its related organs at different scales to tackle the class imbalance and intensity overlap problems. In the first stage, transfer learning is employed to localize the right ventricle region. In the second stage, the centroid of the right ventricle guides the extraction of a region of interest, where pseudo-labels are generated to assist a coarse segmentation of myocardial infarction. In the third stage, morphological post-processing is applied, and fine segmentation is performed. Both the coarse and fine segmentation stages use a modified UNet++ backbone, which integrates texture and semantic extraction modules. Extensive experiments validate the state-of-the-art performance of our model and the effectiveness of its constituent modules. The dataset and source codes are available at https://github.com/DFLAG-NEU/RVMISegNet.
右心室心肌梗死(RVMI)与较高的住院发病率和死亡率相关。心脏磁共振(CMR)成像为RVMI的诊断和/或治疗提供了重要的病理信息。CMR图像中RVMI的分割具有重要意义,但具有挑战性。这是因为,据我们所知,在这个领域没有公开可用的数据集。此外,大部分小于0.2%的比例导致的严重的类不平衡问题以及RVMI与背景之间的极端强度重叠给分割模型的设计带来了挑战。因此,我们发布了一个基准CMR数据集,该数据集由来自飞利浦、GE和西门子设备的3个中心的213名受试者的短轴MR图像组成。提出了一种多阶段序列深度学习模型RVMISegNet,对RVMI及其相关器官进行不同尺度的分割,以解决类不平衡和强度重叠问题。第一阶段采用迁移学习对右心室区域进行定位。在第二阶段,右心室质心引导感兴趣区域的提取,在该区域生成伪标签以辅助心肌梗死的粗分割。第三阶段,进行形态学后处理,进行精细分割。粗分割和细分割都使用改进的UNet++主干,该主干集成了纹理和语义提取模块。大量的实验验证了我们的模型的最先进的性能和其组成模块的有效性。数据集和源代码可在https://github.com/DFLAG-NEU/RVMISegNet上获得。
{"title":"Segmentation of the right ventricular myocardial infarction in multi-centre cardiac magnetic resonance images","authors":"Chao Xu ,&nbsp;Dongaolei An ,&nbsp;Chaolu Feng ,&nbsp;Zijian Bian ,&nbsp;Lian-Ming Wu","doi":"10.1016/j.media.2025.103911","DOIUrl":"10.1016/j.media.2025.103911","url":null,"abstract":"<div><div>Right ventricular myocardial infarction (RVMI) is associated with higher in-hospital morbidity and mortality. Cardiac magnetic resonance (CMR) imaging provides crucial pathological information for diagnosis and/or treatment of RVMI. Segmentation of RVMI in CMR images is significant but challenging. This is because, to the best of our knowledge, there is no publicly available dataset in this field. Furthermore, the severe class imbalance problem caused by mostly less than 0.2 % proportion and the extreme intensity overlap between RVMI and the background bring challenges to the design of segmentation model. Therefore, we release a benchmark CMR dataset, consist of short-axis MR images of 213 subjects from 3 centres acquired by Philips, GE, and Siemens equipments. A multi-stage sequential deep learning model RVMISegNet is proposed to segment RVMI and its related organs at different scales to tackle the class imbalance and intensity overlap problems. In the first stage, transfer learning is employed to localize the right ventricle region. In the second stage, the centroid of the right ventricle guides the extraction of a region of interest, where pseudo-labels are generated to assist a coarse segmentation of myocardial infarction. In the third stage, morphological post-processing is applied, and fine segmentation is performed. Both the coarse and fine segmentation stages use a modified UNet++ backbone, which integrates texture and semantic extraction modules. Extensive experiments validate the state-of-the-art performance of our model and the effectiveness of its constituent modules. The dataset and source codes are available at <span><span>https://github.com/DFLAG-NEU/RVMISegNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103911"},"PeriodicalIF":11.8,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145730677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tracking spatial temporal details in ultrasound long video via wavelet analysis and memory bank 基于小波分析和记忆库的超声长视频时空细节跟踪
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-09 DOI: 10.1016/j.media.2025.103904
Chenxiao Zhang , Runshi Zhang , Junchen Wang
Medical ultrasound videos are widely used for medical inspections, disease diagnosis and surgical planning. High-fidelity lesion area and target organ segmentation constitutes a key component of the computer-assisted surgery workflow. The low contrast levels and noisy backgrounds of ultrasound videos cause missegmentation of organ boundary, which may lead to small object losses and increase boundary segmentation errors. Object tracking in long videos also remains a significant research challenge. To overcome these challenges, we propose a memory bank-based wavelet filtering and fusion network, which adopts an encoder-decoder structure to effectively extract fine-grained detailed spatial features and integrate high-frequency (HF) information. Specifically, memory-based wavelet convolution is presented to simultaneously capture category, detailed information and utilize adjacent information in the encoder. Cascaded wavelet compression is used to fuse multiscale frequency-domain features and expand the receptive field within each convolutional layer. A long short-term memory bank using cross-attention and memory compression mechanisms is designed to track objects in long video. To fully utilize the boundary-sensitive HF details of feature maps, an HF-aware feature fusion module is designed via adaptive wavelet filters in the decoder. In extensive benchmark tests conducted on four ultrasound video datasets (two thyroid nodule, the thyroid gland, the heart datasets) compared with the state-of-the-art methods, our method demonstrates marked improvements in segmentation metrics. In particular, our method can more accurately segment small thyroid nodules, demonstrating its effectiveness for cases involving small ultrasound objects in long video. The code is available at https://github.com/XiAooZ/MWNet.
医学超声影像广泛应用于医学检查、疾病诊断和手术计划。高保真病变区域和靶器官分割是计算机辅助手术工作流程的关键组成部分。超声视频的低对比度和背景噪声会导致器官边界分割错误,导致小物体丢失,增加边界分割误差。长视频中的目标跟踪仍然是一个重大的研究挑战。为了克服这些挑战,我们提出了一种基于记忆库的小波滤波融合网络,该网络采用编码器-解码器结构,有效地提取细粒度的细节空间特征并整合高频(HF)信息。具体来说,基于记忆的小波卷积可以同时捕获编码器中的类别信息、详细信息和利用相邻信息。采用级联小波压缩融合多尺度频域特征,扩展每个卷积层内的接受域。利用交叉注意和记忆压缩机制,设计了一个长短期记忆库来跟踪长视频中的目标。为了充分利用特征图的边界敏感高频细节,在解码器中采用自适应小波滤波器设计了高频感知特征融合模块。在对四个超声视频数据集(两个甲状腺结节,甲状腺,心脏数据集)进行的广泛基准测试中,与最先进的方法相比,我们的方法在分割指标方面有显着改善。特别是,我们的方法可以更准确地分割甲状腺小结节,证明了它对长视频中涉及小超声物体的病例的有效性。代码可在https://github.com/XiAooZ/MWNet上获得。
{"title":"Tracking spatial temporal details in ultrasound long video via wavelet analysis and memory bank","authors":"Chenxiao Zhang ,&nbsp;Runshi Zhang ,&nbsp;Junchen Wang","doi":"10.1016/j.media.2025.103904","DOIUrl":"10.1016/j.media.2025.103904","url":null,"abstract":"<div><div>Medical ultrasound videos are widely used for medical inspections, disease diagnosis and surgical planning. High-fidelity lesion area and target organ segmentation constitutes a key component of the computer-assisted surgery workflow. The low contrast levels and noisy backgrounds of ultrasound videos cause missegmentation of organ boundary, which may lead to small object losses and increase boundary segmentation errors. Object tracking in long videos also remains a significant research challenge. To overcome these challenges, we propose a memory bank-based wavelet filtering and fusion network, which adopts an encoder-decoder structure to effectively extract fine-grained detailed spatial features and integrate high-frequency (HF) information. Specifically, memory-based wavelet convolution is presented to simultaneously capture category, detailed information and utilize adjacent information in the encoder. Cascaded wavelet compression is used to fuse multiscale frequency-domain features and expand the receptive field within each convolutional layer. A long short-term memory bank using cross-attention and memory compression mechanisms is designed to track objects in long video. To fully utilize the boundary-sensitive HF details of feature maps, an HF-aware feature fusion module is designed via adaptive wavelet filters in the decoder. In extensive benchmark tests conducted on four ultrasound video datasets (two thyroid nodule, the thyroid gland, the heart datasets) compared with the state-of-the-art methods, our method demonstrates marked improvements in segmentation metrics. In particular, our method can more accurately segment small thyroid nodules, demonstrating its effectiveness for cases involving small ultrasound objects in long video. The code is available at <span><span>https://github.com/XiAooZ/MWNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103904"},"PeriodicalIF":11.8,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145730676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SAM-Swin: SAM-driven dual-swin transformers with adaptive lesion enhancement for Laryngo-Pharyngeal tumor detection SAM-Swin:基于自适应病灶增强的sam驱动双swin变压器,用于喉咽肿瘤检测
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-08 DOI: 10.1016/j.media.2025.103906
Jia Wei , Yun Li , Xiaomao Fan , Wenjun Ma , Meiyu Qiu , Hongyu Chen , Wenbin Lei
Laryngo-pharyngeal cancer (LPC) is a highly lethal malignancy in the head and neck region. Recent advancements in tumor detection, particularly through dual-branch network architectures, have significantly improved diagnostic accuracy by integrating global and local feature extraction. However, challenges remain in accurately localizing lesions and fully capitalizing on the complementary nature of features within these branches. To address these issues, we propose SAM-Swin, an innovative SAM-driven Dual-Swin Transformer for laryngo-pharyngeal tumor detection. This model leverages the robust segmentation capabilities of the Segment Anything Model 2 (SAM2) to achieve precise lesion segmentation. Meanwhile, we present a multi-scale lesion-aware enhancement module (MS-LAEM) designed to adaptively enhance the learning of nuanced complementary features across various scales, improving the quality of feature extraction and representation. Furthermore, we implement a multi-scale class-aware guidance (CAG) loss that delivers multi-scale targeted supervision, thereby enhancing the model’s capacity to extract class-specific features. To validate our approach, we compiled three LPC datasets from the First Affiliated Hospital (FAHSYSU), the Sixth Affiliated Hospital (SAHSYSU) of Sun Yat-sen University, and Nanfang Hospital of Southern Medical University (NHSMU). The FAHSYSU dataset is utilized for internal training, while the SAHSYSU and NHSMU datasets serve for external evaluation. Extensive experiments demonstrate that SAM-Swin outperforms state-of-the-art methods, showcasing its potential for advancing LPC detection and improving patient outcomes. The source code of SAM-Swin is available at the URL of https://github.com/VVJia/SAM-Swin.
喉咽癌(LPC)是头颈部高度致命的恶性肿瘤。肿瘤检测的最新进展,特别是通过双分支网络架构,通过整合全局和局部特征提取,显著提高了诊断准确性。然而,在准确定位病变和充分利用这些分支内特征的互补性方面仍然存在挑战。为了解决这些问题,我们提出了SAM-Swin,一种创新的sam驱动的双swin转换器,用于喉部肿瘤检测。该模型利用了SAM2 (Segment Anything model 2)的强大分割功能,实现了精确的病灶分割。同时,我们提出了一个多尺度的病灶感知增强模块(MS-LAEM),旨在自适应地增强对不同尺度的细微互补特征的学习,提高特征提取和表征的质量。此外,我们实现了一个多尺度类感知引导(CAG)损失,提供了多尺度目标监督,从而增强了模型提取类特定特征的能力。为了验证我们的方法,我们从中山大学第一附属医院(FAHSYSU)、中山大学第六附属医院(SAHSYSU)和南方医科大学南方医院(NHSMU)收集了三个LPC数据集。FAHSYSU数据集用于内部训练,而SAHSYSU和NHSMU数据集用于外部评估。广泛的实验表明,SAM-Swin优于最先进的方法,展示了其在推进LPC检测和改善患者预后方面的潜力。SAM-Swin的源代码可从https://github.com/VVJia/SAM-Swin获取。
{"title":"SAM-Swin: SAM-driven dual-swin transformers with adaptive lesion enhancement for Laryngo-Pharyngeal tumor detection","authors":"Jia Wei ,&nbsp;Yun Li ,&nbsp;Xiaomao Fan ,&nbsp;Wenjun Ma ,&nbsp;Meiyu Qiu ,&nbsp;Hongyu Chen ,&nbsp;Wenbin Lei","doi":"10.1016/j.media.2025.103906","DOIUrl":"10.1016/j.media.2025.103906","url":null,"abstract":"<div><div>Laryngo-pharyngeal cancer (LPC) is a highly lethal malignancy in the head and neck region. Recent advancements in tumor detection, particularly through dual-branch network architectures, have significantly improved diagnostic accuracy by integrating global and local feature extraction. However, challenges remain in accurately localizing lesions and fully capitalizing on the complementary nature of features within these branches. To address these issues, we propose SAM-Swin, an innovative SAM-driven Dual-Swin Transformer for laryngo-pharyngeal tumor detection. This model leverages the robust segmentation capabilities of the Segment Anything Model 2 (SAM2) to achieve precise lesion segmentation. Meanwhile, we present a multi-scale lesion-aware enhancement module (MS-LAEM) designed to adaptively enhance the learning of nuanced complementary features across various scales, improving the quality of feature extraction and representation. Furthermore, we implement a multi-scale class-aware guidance (CAG) loss that delivers multi-scale targeted supervision, thereby enhancing the model’s capacity to extract class-specific features. To validate our approach, we compiled three LPC datasets from the First Affiliated Hospital (FAHSYSU), the Sixth Affiliated Hospital (SAHSYSU) of Sun Yat-sen University, and Nanfang Hospital of Southern Medical University (NHSMU). The FAHSYSU dataset is utilized for internal training, while the SAHSYSU and NHSMU datasets serve for external evaluation. Extensive experiments demonstrate that SAM-Swin outperforms state-of-the-art methods, showcasing its potential for advancing LPC detection and improving patient outcomes. The source code of SAM-Swin is available at the URL of <span><span>https://github.com/VVJia/SAM-Swin</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103906"},"PeriodicalIF":11.8,"publicationDate":"2025-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145705009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Medical image analysis
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1