首页 > 最新文献

IEEE transactions on medical imaging最新文献

英文 中文
MultiASNet: Multimodal Label Noise Robust Framework for the Classification of Aortic Stenosis in Echocardiography 超声心动图主动脉狭窄分类的多模态标签噪声鲁棒框架。
Pub Date : 2025-09-12 DOI: 10.1109/TMI.2025.3609319
Victoria Wu;Andrea Fung;Bahar Khodabakhshian;Baraa Abdelsamad;Hooman Vaseli;Neda Ahmadi;Jamie A. D. Goco;Michael Y. Tsang;Christina Luong;Purang Abolmaesumi;Teresa S. M. Tsang
Aortic stenosis (AS), a prevalent and serious heart valve disorder, requires early detection but remains difficult to diagnose in routine practice. Although echocardiography with Doppler imaging is the clinical standard, these assessments are typically limited to trained specialists. Point-of-care ultrasound (POCUS) offers an accessible alternative for AS screening but is restricted to basic 2D B-mode imaging, often lacking the analysis Doppler provides. Our project introduces MultiASNet, a multimodal machine learning framework designed to enhance AS screening with POCUS by combining 2D B-mode videos with structured data from echocardiography reports, including Doppler parameters. Using contrastive learning, MultiASNet aligns video features with report features in tabular form from the same patient to improve interpretive quality. To address misalignment where a single report corresponds to multiple video views, some irrelevant to AS diagnosis, we use cross-attention in a transformer-based video and tabular network to assign less importance to irrelevant report data. The model integrates structured data only during training, enabling independent use with B-mode videos during inference for broader accessibility. MultiASNet also incorporates sample selection to counteract label noise from observer variability, yielding improved accuracy on two datasets. We achieved balanced accuracy scores of 93.0% on a private dataset and 83.9% on the public TMED-2 dataset for AS detection. For severity classification, balanced accuracy scores were 80.4% and 59.4% on the private and public datasets, respectively. This model facilitates reliable AS screening in non-specialist settings, bridging the gap left by Doppler data while reducing noise-related errors. Our code is publicly available at github.com/DeepRCL/MultiASNet
主动脉瓣狭窄(AS)是一种普遍而严重的心脏瓣膜疾病,需要早期发现,但在常规实践中仍然难以诊断。虽然超声心动图与多普勒成像是临床标准,这些评估通常仅限于训练有素的专家。即时超声(POCUS)为AS筛查提供了一种可行的替代方法,但仅限于基本的2D b模式成像,通常缺乏多普勒提供的分析。我们的项目引入了MultiASNet,这是一个多模式机器学习框架,旨在通过将2D b模式视频与超声心动图报告的结构化数据(包括多普勒参数)相结合,增强对POCUS的AS筛查。使用对比学习,MultiASNet将来自同一患者的视频特征与报告特征以表格形式对齐,以提高解释质量。为了解决单个报告对应多个视频视图的不一致问题,其中一些与AS诊断无关,我们在基于变压器的视频和表格网络中使用交叉注意来分配不相关报告数据的重要性。该模型仅在训练期间集成结构化数据,可以在推理期间与b模式视频独立使用,以获得更广泛的可访问性。MultiASNet还结合了样本选择来抵消观察者可变性带来的标签噪声,从而提高了两个数据集的准确性。对于AS检测,我们在私有数据集上实现了93.0%的平衡准确率,在公共TMED-2数据集上实现了83.9%的平衡准确率。对于严重性分类,在私有和公共数据集上的平衡准确率得分分别为80.4%和59.4%。该模型有助于在非专业环境中进行可靠的AS筛查,弥补了多普勒数据留下的空白,同时减少了与噪声相关的错误。我们的代码可以在github.com/DeepRCL/MultiASNet上公开获得。
{"title":"MultiASNet: Multimodal Label Noise Robust Framework for the Classification of Aortic Stenosis in Echocardiography","authors":"Victoria Wu;Andrea Fung;Bahar Khodabakhshian;Baraa Abdelsamad;Hooman Vaseli;Neda Ahmadi;Jamie A. D. Goco;Michael Y. Tsang;Christina Luong;Purang Abolmaesumi;Teresa S. M. Tsang","doi":"10.1109/TMI.2025.3609319","DOIUrl":"10.1109/TMI.2025.3609319","url":null,"abstract":"Aortic stenosis (AS), a prevalent and serious heart valve disorder, requires early detection but remains difficult to diagnose in routine practice. Although echocardiography with Doppler imaging is the clinical standard, these assessments are typically limited to trained specialists. Point-of-care ultrasound (POCUS) offers an accessible alternative for AS screening but is restricted to basic 2D B-mode imaging, often lacking the analysis Doppler provides. Our project introduces MultiASNet, a multimodal machine learning framework designed to enhance AS screening with POCUS by combining 2D B-mode videos with structured data from echocardiography reports, including Doppler parameters. Using contrastive learning, MultiASNet aligns video features with report features in tabular form from the same patient to improve interpretive quality. To address misalignment where a single report corresponds to multiple video views, some irrelevant to AS diagnosis, we use cross-attention in a transformer-based video and tabular network to assign less importance to irrelevant report data. The model integrates structured data only during training, enabling independent use with B-mode videos during inference for broader accessibility. MultiASNet also incorporates sample selection to counteract label noise from observer variability, yielding improved accuracy on two datasets. We achieved balanced accuracy scores of 93.0% on a private dataset and 83.9% on the public TMED-2 dataset for AS detection. For severity classification, balanced accuracy scores were 80.4% and 59.4% on the private and public datasets, respectively. This model facilitates reliable AS screening in non-specialist settings, bridging the gap left by Doppler data while reducing noise-related errors. Our code is publicly available at github.com/DeepRCL/MultiASNet","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"45 2","pages":"799-810"},"PeriodicalIF":0.0,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145043575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Label-Efficient Deep Color Deconvolution of Brightfield Multiplex IHC Images 亮场多重IHC图像的标签高效深彩色反卷积
Pub Date : 2025-09-11 DOI: 10.1109/TMI.2025.3609245
Shahira Abousamra;Danielle Fassler;Rajarsi Gupta;Tahsin Kurc;Luisa F. Escobar-Hoyos;Dimitris Samaras;Kenneth R. Shroyer;Joel Saltz;Chao Chen
Brightfield Multiplex Immunohistochemistry (mIHC) provides simultaneous labeling of multiple protein biomarkers in the same tissue section. It enables the exploration of spatial relationships between the inflammatory microenvironment and tumor cells, and to uncover how tumor cell morphology relates to cancer biomarker expression. Color deconvolution is required to analyze and quantify the different cell phenotype populations present as indicated by the biomarkers. However, this becomes a challenging task as the number of multiplexed stains increase. In this work, we present self-supervised and semi-supervised approaches to mIHC color deconvolution. Our proposed methods are based on deep convolutional autoencoders and learn using innovative reconstruction losses inspired by physics. We show how we can integrate weak annotations and the abundant unlabeled data available to train a model to reliably unmix the multiplexed stains and generate stain segmentation maps. We demonstrate the effectiveness of our proposed methods through experiments on mIHC dataset of 7-plexed IHC images.
Brightfield Multiplex Immunohistochemistry (mIHC)提供同一组织切片中多种蛋白质生物标志物的同时标记。它可以探索炎症微环境与肿瘤细胞之间的空间关系,揭示肿瘤细胞形态与癌症生物标志物表达的关系。颜色反褶积需要分析和量化不同的细胞表型群体,如生物标志物所示。然而,随着多路污渍数量的增加,这成为一项具有挑战性的任务。在这项工作中,我们提出了自监督和半监督的mIHC颜色反卷积方法。我们提出的方法基于深度卷积自编码器,并使用受物理启发的创新重建损失进行学习。我们展示了如何整合弱注释和大量可用的未标记数据来训练模型,以可靠地解混多路染色并生成染色分割图。我们通过7-plex IHC图像的mIHC数据集实验证明了我们提出的方法的有效性。
{"title":"Label-Efficient Deep Color Deconvolution of Brightfield Multiplex IHC Images","authors":"Shahira Abousamra;Danielle Fassler;Rajarsi Gupta;Tahsin Kurc;Luisa F. Escobar-Hoyos;Dimitris Samaras;Kenneth R. Shroyer;Joel Saltz;Chao Chen","doi":"10.1109/TMI.2025.3609245","DOIUrl":"10.1109/TMI.2025.3609245","url":null,"abstract":"Brightfield Multiplex Immunohistochemistry (mIHC) provides simultaneous labeling of multiple protein biomarkers in the same tissue section. It enables the exploration of spatial relationships between the inflammatory microenvironment and tumor cells, and to uncover how tumor cell morphology relates to cancer biomarker expression. Color deconvolution is required to analyze and quantify the different cell phenotype populations present as indicated by the biomarkers. However, this becomes a challenging task as the number of multiplexed stains increase. In this work, we present self-supervised and semi-supervised approaches to mIHC color deconvolution. Our proposed methods are based on deep convolutional autoencoders and learn using innovative reconstruction losses inspired by physics. We show how we can integrate weak annotations and the abundant unlabeled data available to train a model to reliably unmix the multiplexed stains and generate stain segmentation maps. We demonstrate the effectiveness of our proposed methods through experiments on mIHC dataset of 7-plexed IHC images.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"45 2","pages":"853-864"},"PeriodicalIF":0.0,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145035225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quantifying Tumor Microvasculature With Optical Coherence Angiography and Intravoxel Incoherent Motion Diffusion MRI 用光学相干血管造影和体素内非相干运动扩散MRI定量肿瘤微血管。
Pub Date : 2025-09-10 DOI: 10.1109/TMI.2025.3607752
W. Jeffrey Zabel;Héctor Contreras-Sánchez;Warren Foltz;Costel Flueraru;Edward Taylor;Alex Vitkin
Intravoxel Incoherent Motion (IVIM) MRI is a contrast-agent-free microvascular imaging method finding increasing use in biomedicine. However, there is uncertainty in the ability of IVIM-MRI to quantify tissue microvasculature given MRI’s limited spatial resolution (mm scale). Nine NRG mice were subcutaneously inoculated with human pancreatic cancer BxPC-3 cells transfected with DsRed, and MR-compatible plastic window chambers were surgically installed in the dorsal skinfold. Mice were imaged with speckle variance optical coherence tomography (OCT) and colour Doppler OCT, providing high resolution 3D measurements of the vascular volume density (VVD) and average Doppler phase shift ( $overline {Delta phi }text {)}$ respectively. IVIM imaging was performed on a 7T preclinical MRI scanner, to generate maps of the perfusion fraction f, the extravascular diffusion coefficient ${D}_{textit {slow}}$ , and the intravascular diffusion coefficient ${D}_{textit {fast}}$ . The IVIM parameter maps were coregistered with the optical datasets to enable direct spatial correlation. A significant positive correlation was noted between OCT’s VVD and MR’s f (Pearson correlation coefficient ${r}={0}.{34},{p}lt {0}.{0001}text {)}$ . Surprisingly, no significant correlation was found between $overline {Delta phi }$ and ${D}_{textit {fast}}$ . This may be due to larger errors in the determined ${D}_{textit {fast}}$ values compared to f, as confirmed by Monte Carlo simulations. Several other inter- and intra-modality correlations were also quantified. Direct same-animal correlation of clinically applicable IVIM imaging with preclinical OCT microvascular imaging support the biomedical relevance of IVIM-MRI metrics, for example through f’s relationship to the VVD.
体素内非相干运动(IVIM) MRI是一种无造影剂的微血管成像方法,在生物医学中应用越来越广泛。然而,考虑到MRI有限的空间分辨率(毫米尺度),IVIM-MRI量化组织微血管的能力存在不确定性。将转染DsRed的人胰腺癌BxPC-3细胞皮下接种9只NRG小鼠,并在其背部皮肤褶上手术植入与磁共振兼容的塑料窗室。对小鼠进行散斑方差光学相干断层扫描(OCT)和彩色多普勒OCT成像,分别提供血管体积密度(VVD)和平均多普勒相移(Δϕ)的高分辨率3D测量。在7T临床前MRI扫描仪上进行IVIM成像,生成灌注分数f、血管外扩散系数Dslow和血管内扩散系数Dfast的图。IVIM参数图与光学数据集共同注册,以实现直接的空间相关性。OCT的VVD与MR的f呈显著正相关(Pearson相关系数r = 0.34,p < 0.0001)。令人惊讶的是,在Δϕ和Dfast之间没有发现显著的相关性。这可能是由于确定的Dfast值与f相比误差更大,正如蒙特卡罗模拟所证实的那样。其他几个模态间和模态内的相关性也被量化。临床应用的IVIM成像与临床前OCT微血管成像的直接同动物相关性支持了IVIM- mri指标的生物医学相关性,例如通过f与VVD的关系。
{"title":"Quantifying Tumor Microvasculature With Optical Coherence Angiography and Intravoxel Incoherent Motion Diffusion MRI","authors":"W. Jeffrey Zabel;Héctor Contreras-Sánchez;Warren Foltz;Costel Flueraru;Edward Taylor;Alex Vitkin","doi":"10.1109/TMI.2025.3607752","DOIUrl":"10.1109/TMI.2025.3607752","url":null,"abstract":"Intravoxel Incoherent Motion (IVIM) MRI is a contrast-agent-free microvascular imaging method finding increasing use in biomedicine. However, there is uncertainty in the ability of IVIM-MRI to quantify tissue microvasculature given MRI’s limited spatial resolution (mm scale). Nine NRG mice were subcutaneously inoculated with human pancreatic cancer BxPC-3 cells transfected with DsRed, and MR-compatible plastic window chambers were surgically installed in the dorsal skinfold. Mice were imaged with speckle variance optical coherence tomography (OCT) and colour Doppler OCT, providing high resolution 3D measurements of the vascular volume density (VVD) and average Doppler phase shift (<inline-formula> <tex-math>$overline {Delta phi }text {)}$ </tex-math></inline-formula> respectively. IVIM imaging was performed on a 7T preclinical MRI scanner, to generate maps of the perfusion fraction f, the extravascular diffusion coefficient <inline-formula> <tex-math>${D}_{textit {slow}}$ </tex-math></inline-formula>, and the intravascular diffusion coefficient <inline-formula> <tex-math>${D}_{textit {fast}}$ </tex-math></inline-formula>. The IVIM parameter maps were coregistered with the optical datasets to enable direct spatial correlation. A significant positive correlation was noted between OCT’s VVD and MR’s f (Pearson correlation coefficient <inline-formula> <tex-math>${r}={0}.{34},{p}lt {0}.{0001}text {)}$ </tex-math></inline-formula>. Surprisingly, no significant correlation was found between <inline-formula> <tex-math>$overline {Delta phi }$ </tex-math></inline-formula> and <inline-formula> <tex-math>${D}_{textit {fast}}$ </tex-math></inline-formula>. This may be due to larger errors in the determined <inline-formula> <tex-math>${D}_{textit {fast}}$ </tex-math></inline-formula> values compared to f, as confirmed by Monte Carlo simulations. Several other inter- and intra-modality correlations were also quantified. Direct same-animal correlation of clinically applicable IVIM imaging with preclinical OCT microvascular imaging support the biomedical relevance of IVIM-MRI metrics, for example through f’s relationship to the VVD.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"45 2","pages":"789-798"},"PeriodicalIF":0.0,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145031941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CASHNet: Context-Aware Semantics-Driven Hierarchical Network for Hybrid Diffeomorphic CT-CBCT Image Registration 基于上下文感知语义驱动的CT-CBCT混合差分图像配准层次网络。
Pub Date : 2025-09-09 DOI: 10.1109/TMI.2025.3607700
Xiaoru Gao;Housheng Xie;Donghua Hang;Guoyan Zheng
Computed Tomography (CT) to Cone-Beam Computed Tomography (CBCT) image registration is crucial for image-guided radiotherapy and surgical procedures. However, achieving accurate CT-CBCT registration remains challenging due to various factors such as inconsistent intensities, low contrast resolution and imaging artifacts. In this study, we propose a Context-Aware Semantics-driven Hierarchical Network (referred to as CASHNet), which hierarchically integrates context-aware semantics-encoded features into a coarse-to-fine registration scheme, to explicitly enhance semantic structural perception during progressive alignment. Moreover, it leverages diffeomorphisms to integrate rigid and non-rigid registration within a single end-to-end trainable network, enabling anatomically plausible deformations and preserving topological consistency. CASHNet comprises a Siamese Mamba-based multi-scale feature encoder and a coarse-to-fine registration decoder, which integrates a Rigid Registration (RR) module with multiple Semantics-guided Velocity Estimation and Feature Alignment (SVEFA) modules operating at different resolutions. Each SVEFA module comprises three carefully designed components: i) a cross-resolution feature aggregation (CFA) component that synthesizes enhanced global contextual representations, ii) a semantics perception and encoding (SPE) component that captures and encodes local semantic information, and iii) an incremental velocity estimation and feature alignment (IVEFA) component that leverages contextual and semantic features to update velocity fields and to align features. These modules work synergistically to boost the overall registration performance. Extensive experiments on three typical yet challenging CT-CBCT datasets of both soft and hard tissues demonstrate the superiority of our proposed method over other state-of-the-art methods. The code will be publicly available at https://github.com/xiaorugao999/CASHNet
计算机断层扫描(CT)到锥形束计算机断层扫描(CBCT)图像配准对于图像引导的放射治疗和外科手术至关重要。然而,由于各种因素,如强度不一致、对比度分辨率低和成像伪影,实现准确的CT-CBCT配准仍然具有挑战性。在这项研究中,我们提出了一个上下文感知语义驱动的分层网络(CASHNet),它分层地将上下文感知语义编码的特征集成到一个从粗到精的注册方案中,以显式地增强在逐步对齐过程中的语义结构感知。此外,它利用微分同态在单个端到端可训练网络中集成刚性和非刚性注册,从而实现解剖学上合理的变形并保持拓扑一致性。CASHNet包括一个基于暹罗曼巴的多尺度特征编码器和一个粗到细的配准解码器,该解码器集成了一个刚性配准(RR)模块和多个以不同分辨率运行的语义引导的速度估计和特征对齐(SVEFA)模块。每个SVEFA模块由三个精心设计的组件组成:1)合成增强的全局上下文表示的跨分辨率特征聚合(CFA)组件,2)捕获和编码局部语义信息的语义感知和编码(SPE)组件,3)利用上下文和语义特征更新速度场和对齐特征的增量速度估计和特征对齐(IVEFA)组件。这些模块协同工作以提高整体注册性能。在三个典型但具有挑战性的软组织和硬组织CT-CBCT数据集上进行的大量实验表明,我们提出的方法优于其他最先进的方法。代码将在https://github.com/xiaorugao999/CASHNet上公开。
{"title":"CASHNet: Context-Aware Semantics-Driven Hierarchical Network for Hybrid Diffeomorphic CT-CBCT Image Registration","authors":"Xiaoru Gao;Housheng Xie;Donghua Hang;Guoyan Zheng","doi":"10.1109/TMI.2025.3607700","DOIUrl":"10.1109/TMI.2025.3607700","url":null,"abstract":"Computed Tomography (CT) to Cone-Beam Computed Tomography (CBCT) image registration is crucial for image-guided radiotherapy and surgical procedures. However, achieving accurate CT-CBCT registration remains challenging due to various factors such as inconsistent intensities, low contrast resolution and imaging artifacts. In this study, we propose a Context-Aware Semantics-driven Hierarchical Network (referred to as CASHNet), which hierarchically integrates context-aware semantics-encoded features into a coarse-to-fine registration scheme, to explicitly enhance semantic structural perception during progressive alignment. Moreover, it leverages diffeomorphisms to integrate rigid and non-rigid registration within a single end-to-end trainable network, enabling anatomically plausible deformations and preserving topological consistency. CASHNet comprises a Siamese Mamba-based multi-scale feature encoder and a coarse-to-fine registration decoder, which integrates a Rigid Registration (RR) module with multiple Semantics-guided Velocity Estimation and Feature Alignment (SVEFA) modules operating at different resolutions. Each SVEFA module comprises three carefully designed components: i) a cross-resolution feature aggregation (CFA) component that synthesizes enhanced global contextual representations, ii) a semantics perception and encoding (SPE) component that captures and encodes local semantic information, and iii) an incremental velocity estimation and feature alignment (IVEFA) component that leverages contextual and semantic features to update velocity fields and to align features. These modules work synergistically to boost the overall registration performance. Extensive experiments on three typical yet challenging CT-CBCT datasets of both soft and hard tissues demonstrate the superiority of our proposed method over other state-of-the-art methods. The code will be publicly available at <uri>https://github.com/xiaorugao999/CASHNet</uri>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"45 2","pages":"825-842"},"PeriodicalIF":0.0,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145025299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lesion Asymmetry Screening Assisted Global Awareness Multi-View Network for Mammogram Classification 病变不对称筛查辅助全局感知多视图网络进行乳腺x线照片分类。
Pub Date : 2025-09-09 DOI: 10.1109/TMI.2025.3607877
Xinchuan Liu;Luhao Sun;Chao Li;Bowen Han;Wenzong Jiang;Tianhao Yuan;Weifeng Liu;Zhaoyun Liu;Zhiyong Yu;Baodi Liu
Mammography is a primary method for early screening, and developing deep learning-based computer-aided systems is of great significance. However, current deep learning models typically treat each image as an independent entity for diagnosis, rather than integrating images from multiple views to diagnose the patient. These methods do not fully consider and address the complex interactions between different views, resulting in poor diagnostic performance and interpretability. To address this issue, this paper proposes a novel end-to-end framework for breast cancer diagnosis: lesion asymmetry screening assisted global awareness multi-view network (LAS-GAM). More than just the most common image-level diagnostic model, LAS-GAM operates at the patient level, simulating the workflow of radiologists analyzing mammographic images. The framework processes the four views of a patient and revolves around two key modules: a global module and a lesion screening module. The global module simulates the comprehensive assessment by radiologists, integrating complementary information from the craniocaudal (CC) and mediolateral oblique (MLO) views of both breasts to generate global features that represent the patient’s overall condition. The lesion screening module mimics the process of locating lesions by comparing symmetric regions in contralateral views, identifying potential lesion areas and extracting lesion-specific features using a lightweight model. By combining the global features and lesion-specific features, LAS-GAM simulates the diagnostic process, making patient-level predictions. Moreover, it is trained using only patient-level labels, significantly reducing data annotation costs. Experiments on the Digital Database for Screening Mammography (DDSM) and In-house datasets validate LAS-GAM, achieving AUCs of 0.817 and 0.894, respectively.
乳房x光检查是早期筛查的主要方法,开发基于深度学习的计算机辅助系统具有重要意义。然而,目前的深度学习模型通常将每个图像视为一个独立的实体进行诊断,而不是将来自多个视图的图像集成来诊断患者。这些方法没有充分考虑和处理不同视图之间复杂的相互作用,导致诊断性能和可解释性较差。为了解决这一问题,本文提出了一种新的端到端乳腺癌诊断框架:病变不对称筛查辅助全局感知多视图网络(LAS-GAM)。LAS-GAM不仅仅是最常见的图像级诊断模型,它还可以在患者层面运行,模拟放射科医生分析乳房x线摄影图像的工作流程。该框架处理患者的四个视图,并围绕两个关键模块:全局模块和病变筛查模块。全局模块模拟放射科医生的综合评估,整合来自双乳颅侧(CC)和中外侧斜(MLO)视图的互补信息,生成代表患者整体状况的全局特征。病变筛选模块通过比较对侧视图的对称区域,识别潜在病变区域,并使用轻量级模型提取病变特异性特征,模拟病变定位过程。通过结合全局特征和病变特异性特征,LAS-GAM模拟诊断过程,做出患者级别的预测。此外,它只使用患者级别的标签进行训练,大大降低了数据注释成本。在乳腺筛查数字数据库(DDSM)和内部数据集上的实验验证了LAS-GAM, auc分别为0.817和0.894。
{"title":"Lesion Asymmetry Screening Assisted Global Awareness Multi-View Network for Mammogram Classification","authors":"Xinchuan Liu;Luhao Sun;Chao Li;Bowen Han;Wenzong Jiang;Tianhao Yuan;Weifeng Liu;Zhaoyun Liu;Zhiyong Yu;Baodi Liu","doi":"10.1109/TMI.2025.3607877","DOIUrl":"10.1109/TMI.2025.3607877","url":null,"abstract":"Mammography is a primary method for early screening, and developing deep learning-based computer-aided systems is of great significance. However, current deep learning models typically treat each image as an independent entity for diagnosis, rather than integrating images from multiple views to diagnose the patient. These methods do not fully consider and address the complex interactions between different views, resulting in poor diagnostic performance and interpretability. To address this issue, this paper proposes a novel end-to-end framework for breast cancer diagnosis: lesion asymmetry screening assisted global awareness multi-view network (LAS-GAM). More than just the most common image-level diagnostic model, LAS-GAM operates at the patient level, simulating the workflow of radiologists analyzing mammographic images. The framework processes the four views of a patient and revolves around two key modules: a global module and a lesion screening module. The global module simulates the comprehensive assessment by radiologists, integrating complementary information from the craniocaudal (CC) and mediolateral oblique (MLO) views of both breasts to generate global features that represent the patient’s overall condition. The lesion screening module mimics the process of locating lesions by comparing symmetric regions in contralateral views, identifying potential lesion areas and extracting lesion-specific features using a lightweight model. By combining the global features and lesion-specific features, LAS-GAM simulates the diagnostic process, making patient-level predictions. Moreover, it is trained using only patient-level labels, significantly reducing data annotation costs. Experiments on the Digital Database for Screening Mammography (DDSM) and In-house datasets validate LAS-GAM, achieving AUCs of 0.817 and 0.894, respectively.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"45 2","pages":"777-788"},"PeriodicalIF":0.0,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145025300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Co-Activation Pattern Analysis Based on Hidden Semi-Markov Model for Brain Spatiotemporal Dynamics 基于隐半马尔可夫模型的脑时空动力学共激活模式分析
Pub Date : 2025-09-08 DOI: 10.1109/TMI.2025.3607113
Zihao Yuan;Jiaqing Chen;Han Qiu;Houxiang Wang;Yangxin Huang;Fuchun Lin
Analyzing the spontaneous activity of the human brain using dynamic approaches can reveal functional organizations. The co-activation pattern (CAP) analysis of signals from different brain regions is used to characterize brain neural networks that may serve specialized functions. However, CAP is based on spatial information but ignores temporal reproducible transition patterns, and lacks robustness to low signal-to-noise rate (SNR) data. To address these issues, this study proposes a new CAP framework based on hidden semi-Markov model (HSMM) called HSMM-CAP analysis, which can be performed to investigate spatiotemporal CAPs (stCAPs) of the brain. HSMM-CAP uses empirical spatial distributions of stCAPs as emission models, and assumes that the state sequence of stCAPs follows a semi-Markov process. Based on the assumptions of sparsity, heterogeneity, and semi-Markov property of stCAPs, the HSMM-CAP-K-means method is constructed to infer the state sequence and transition parameters of stCAPs. In addition, HSMM-CAP provides the inverse relationship between the number of states and sparsity. Simulation studies verify the performance of HSMM-CAP at different levels of SNR. The spatiotemporal dynamics of stCAPs are also revealed by the proposed method on real-world resting-state fMRI data. Our method provides a new data-driven computational framework for revealing the brain spatiotemporal dynamics of resting-state fMRI data.
用动态方法分析人类大脑的自发活动可以揭示功能组织。对来自不同脑区信号的共激活模式(CAP)分析用于表征可能具有特殊功能的脑神经网络。然而,CAP基于空间信息而忽略了时间可重复的过渡模式,并且对低信噪比(SNR)数据缺乏鲁棒性。为了解决这些问题,本研究提出了一种新的基于隐半马尔可夫模型(HSMM)的CAP框架,称为HSMM-CAP分析,可以用于研究大脑的时空CAP (stCAPs)。HSMM-CAP采用stcap的经验空间分布作为排放模型,并假设stcap的状态序列遵循半马尔可夫过程。基于stcap的稀疏性、异质性和半马尔可夫性假设,构造了HSMM-CAP-K-means方法来推断stcap的状态序列和转移参数。此外,HSMM-CAP还提供了状态数与稀疏度之间的反比关系。仿真研究验证了HSMM-CAP在不同信噪比下的性能。该方法还对静息态fMRI数据进行了分析,揭示了stcap的时空动态。我们的方法为揭示静息状态fMRI数据的大脑时空动态提供了一个新的数据驱动的计算框架。
{"title":"Co-Activation Pattern Analysis Based on Hidden Semi-Markov Model for Brain Spatiotemporal Dynamics","authors":"Zihao Yuan;Jiaqing Chen;Han Qiu;Houxiang Wang;Yangxin Huang;Fuchun Lin","doi":"10.1109/TMI.2025.3607113","DOIUrl":"10.1109/TMI.2025.3607113","url":null,"abstract":"Analyzing the spontaneous activity of the human brain using dynamic approaches can reveal functional organizations. The co-activation pattern (CAP) analysis of signals from different brain regions is used to characterize brain neural networks that may serve specialized functions. However, CAP is based on spatial information but ignores temporal reproducible transition patterns, and lacks robustness to low signal-to-noise rate (SNR) data. To address these issues, this study proposes a new CAP framework based on hidden semi-Markov model (HSMM) called HSMM-CAP analysis, which can be performed to investigate spatiotemporal CAPs (stCAPs) of the brain. HSMM-CAP uses empirical spatial distributions of stCAPs as emission models, and assumes that the state sequence of stCAPs follows a semi-Markov process. Based on the assumptions of sparsity, heterogeneity, and semi-Markov property of stCAPs, the HSMM-CAP-K-means method is constructed to infer the state sequence and transition parameters of stCAPs. In addition, HSMM-CAP provides the inverse relationship between the number of states and sparsity. Simulation studies verify the performance of HSMM-CAP at different levels of SNR. The spatiotemporal dynamics of stCAPs are also revealed by the proposed method on real-world resting-state fMRI data. Our method provides a new data-driven computational framework for revealing the brain spatiotemporal dynamics of resting-state fMRI data.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"45 2","pages":"843-852"},"PeriodicalIF":0.0,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145017614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MetaSSL: A General Heterogeneous Loss for Semi-Supervised Medical Image Segmentation MetaSSL:一种用于半监督医学图像分割的通用异构损失
Pub Date : 2025-09-03 DOI: 10.1109/TMI.2025.3605617
Weiren Zhao;Lanfeng Zhong;Xin Liao;Wenjun Liao;Sichuan Zhang;Shaoting Zhang;Guotai Wang
Semi-Supervised Learning (SSL) is important for reducing the annotation cost for medical image segmentation models. State-of-the-art SSL methods such as Mean Teacher, FixMatch and Cross Pseudo Supervision (CPS) are mainly based on consistency regularization or pseudo-label supervision between a reference prediction and a supervised prediction. Despite the effectiveness, they have overlooked the potential noise in the labeled data, and mainly focus on strategies to generate the reference prediction, while ignoring the heterogeneous values of different unlabeled pixels. We argue that effectively mining the rich information contained by the two predictions in the loss function, instead of the specific strategy to obtain a reference prediction, is more essential for SSL, and propose a universal framework MetaSSL based on a spatially heterogeneous loss that assigns different weights to pixels by simultaneously leveraging the uncertainty and consistency information between the reference and supervised predictions. Specifically, we split the predictions on unlabeled data into four regions with decreasing weights in the loss: Unanimous and Confident (UC), Unanimous and Suspicious (US), Discrepant and Confident (DC), and Discrepant and Suspicious (DS), where an adaptive threshold is proposed to distinguish confident predictions from suspicious ones. The heterogeneous loss is also applied to labeled images for robust learning considering the potential annotation noise. Our method is plug-and-play and general to most existing SSL methods. The experimental results showed that it improved the segmentation performance significantly when integrated with existing SSL frameworks on different datasets. Code is available at https://github.com/HiLab-git/MetaSSL
半监督学习(SSL)对于降低医学图像分割模型的标注成本具有重要意义。目前最先进的SSL方法,如Mean Teacher、FixMatch和Cross Pseudo Supervision (CPS),主要是基于参考预测和监督预测之间的一致性正则化或伪标签监督。尽管有效,但它们忽略了标记数据中潜在的噪声,主要集中在生成参考预测的策略上,而忽略了不同未标记像素的异构值。我们认为有效地挖掘损失函数中两种预测所包含的丰富信息,而不是特定的策略来获得参考预测,对于SSL更重要,并提出了一个基于空间异构损失的通用框架MetaSSL,该框架通过同时利用参考和监督预测之间的不确定性和一致性信息为像素分配不同的权重。具体来说,我们将未标记数据上的预测分为四个损失权重递减的区域:一致和自信(UC)、一致和可疑(US)、差异和自信(DC)和差异和可疑(DS),其中提出了一个自适应阈值来区分可信预测和可疑预测。考虑到潜在的标注噪声,异构损失也被应用于标记图像的鲁棒学习。我们的方法是即插即用的,适用于大多数现有的SSL方法。实验结果表明,该算法与现有的SSL框架集成在不同的数据集上,显著提高了分割性能。代码可从https://github.com/HiLab-git/MetaSSL获得
{"title":"MetaSSL: A General Heterogeneous Loss for Semi-Supervised Medical Image Segmentation","authors":"Weiren Zhao;Lanfeng Zhong;Xin Liao;Wenjun Liao;Sichuan Zhang;Shaoting Zhang;Guotai Wang","doi":"10.1109/TMI.2025.3605617","DOIUrl":"10.1109/TMI.2025.3605617","url":null,"abstract":"Semi-Supervised Learning (SSL) is important for reducing the annotation cost for medical image segmentation models. State-of-the-art SSL methods such as Mean Teacher, FixMatch and Cross Pseudo Supervision (CPS) are mainly based on consistency regularization or pseudo-label supervision between a reference prediction and a supervised prediction. Despite the effectiveness, they have overlooked the potential noise in the labeled data, and mainly focus on strategies to generate the reference prediction, while ignoring the heterogeneous values of different unlabeled pixels. We argue that effectively mining the rich information contained by the two predictions in the loss function, instead of the specific strategy to obtain a reference prediction, is more essential for SSL, and propose a universal framework <bold>MetaSSL</b> based on a spatially heterogeneous loss that assigns different weights to pixels by simultaneously leveraging the uncertainty and consistency information between the reference and supervised predictions. Specifically, we split the predictions on unlabeled data into four regions with decreasing weights in the loss: Unanimous and Confident (UC), Unanimous and Suspicious (US), Discrepant and Confident (DC), and Discrepant and Suspicious (DS), where an adaptive threshold is proposed to distinguish confident predictions from suspicious ones. The heterogeneous loss is also applied to labeled images for robust learning considering the potential annotation noise. Our method is plug-and-play and general to most existing SSL methods. The experimental results showed that it improved the segmentation performance significantly when integrated with existing SSL frameworks on different datasets. Code is available at <uri>https://github.com/HiLab-git/MetaSSL</uri>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"45 2","pages":"751-763"},"PeriodicalIF":0.0,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144987556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Teacher–Student Instance-Level Adversarial Augmentation for Single Domain Generalized Medical Image Segmentation 基于实例级对抗增强的单域广义医学图像分割
Pub Date : 2025-09-02 DOI: 10.1109/TMI.2025.3605162
Zhengshan Wang;Long Chen;Xuelin Xie;Yang Zhang;Yunpeng Cai;Weiping Ding
Recently, single-source domain generalization (SDG) has gained popularity in medical image segmentation. As a prominent technique, adversarial image augmentation technique can generate synthetic training data that are challenging for the segmentation model to recognize. To avoid the over-augmentation problem, existing adversarial-based works often employ augmenters with relatively simple structures for medical images, typically operating at the image level, limiting the diversity of the augmented images. In this paper, we propose a Teacher-Student Instance-level Adversarial Augmentation (TSIAA) model for generalized medical image segmentation. The objective of TSIAA is to derive domain-generalizable representations by exploring out-of-source data distributions. First, we construct an Instance-level Image Augmenter (IIAG) using several Instance-level Augmentation Modules (IAMs), which are based on the learnable constrained Bèzier transformation function. Compared to image-level adversarial augmentation, instance-level adversarial augmentation breaks the uniformity of augmentation rules across different structures within an image, thereby providing greater diversity. Then, TSIAA conducts Teacher-Student (TS) learning through an adversarial approach, alternating novel image augmentation and generalized representation learning. The former delves into out-of-source and plausible data, while the latter continuously updates both the student and teacher to ensure the original and augmented features maintain consistent and generalized characteristics. By integrating both strategies, our proposed TSIAA model achieves significant improvements over state-of-the-art methods in four challenging SDG tasks. The code can be accessed at https://github.com/Wangzs0228/TSIAA
近年来,单源域泛化(SDG)在医学图像分割中得到了广泛的应用。作为一项突出的技术,对抗图像增强技术可以生成合成训练数据,这对分割模型来说是一个挑战。为了避免过度增强问题,现有的基于对抗性的工作通常使用结构相对简单的医学图像增强器,通常在图像层面上操作,限制了增强图像的多样性。本文提出了一种用于广义医学图像分割的师生实例级对抗增强(TSIAA)模型。TSIAA的目标是通过探索源外数据分布来获得领域泛化表示。首先,我们使用几个基于可学习约束b齐尔变换函数的实例级增强模块(iam)构建了一个实例级图像增强器(IIAG)。与图像级对抗增强相比,实例级对抗增强打破了图像内不同结构增强规则的一致性,从而提供了更大的多样性。然后,TSIAA通过对抗性方法进行师生学习,交替使用新图像增强和广义表征学习。前者挖掘源外可信的数据,而后者不断更新学生和教师,以确保原始特征和增强特征保持一致和广义特征。通过整合这两种策略,我们提出的TSIAA模型在四个具有挑战性的可持续发展目标任务中取得了比最先进的方法显著的改进。代码可以在https://github.com/Wangzs0228/TSIAA上访问
{"title":"Teacher–Student Instance-Level Adversarial Augmentation for Single Domain Generalized Medical Image Segmentation","authors":"Zhengshan Wang;Long Chen;Xuelin Xie;Yang Zhang;Yunpeng Cai;Weiping Ding","doi":"10.1109/TMI.2025.3605162","DOIUrl":"10.1109/TMI.2025.3605162","url":null,"abstract":"Recently, single-source domain generalization (SDG) has gained popularity in medical image segmentation. As a prominent technique, adversarial image augmentation technique can generate synthetic training data that are challenging for the segmentation model to recognize. To avoid the over-augmentation problem, existing adversarial-based works often employ augmenters with relatively simple structures for medical images, typically operating at the image level, limiting the diversity of the augmented images. In this paper, we propose a Teacher-Student Instance-level Adversarial Augmentation (TSIAA) model for generalized medical image segmentation. The objective of TSIAA is to derive domain-generalizable representations by exploring out-of-source data distributions. First, we construct an Instance-level Image Augmenter (IIAG) using several Instance-level Augmentation Modules (IAMs), which are based on the learnable constrained Bèzier transformation function. Compared to image-level adversarial augmentation, instance-level adversarial augmentation breaks the uniformity of augmentation rules across different structures within an image, thereby providing greater diversity. Then, TSIAA conducts Teacher-Student (TS) learning through an adversarial approach, alternating novel image augmentation and generalized representation learning. The former delves into out-of-source and plausible data, while the latter continuously updates both the student and teacher to ensure the original and augmented features maintain consistent and generalized characteristics. By integrating both strategies, our proposed TSIAA model achieves significant improvements over state-of-the-art methods in four challenging SDG tasks. The code can be accessed at <uri>https://github.com/Wangzs0228/TSIAA</uri>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"45 2","pages":"764-776"},"PeriodicalIF":0.0,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144930697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive Sequential Bayesian Iterative Learning for Myocardial Motion Estimation on Cardiac Image Sequences 基于自适应序列贝叶斯迭代学习的心脏图像序列心肌运动估计。
Pub Date : 2025-08-18 DOI: 10.1109/TMI.2025.3599487
Shuxin Zhuang;Heye Zhang;Dong Liang;Hui Liu;Zhifan Gao
Motion estimation of left ventricle myocardium on the cardiac image sequence is crucial for assessing cardiac function. However, the intensity variation of cardiac image sequences brings the challenge of uncertain interference to myocardial motion estimation. Such imaging-related uncertain interference appears in different cardiac imaging modalities. We propose adaptive sequential Bayesian iterative learning to overcome the challenge. Specifically, our method applies the adaptive structural inference to state transition and observation to cope with a complex myocardial motion under uncertain setting. In state transition, adaptive structural inference establishes a hierarchical structure recurrence to obtain the complex latent representation of cardiac image sequences. In state observation, the adaptive structural inference forms a chain structure mapping to correlate the latent representation of the cardiac image sequence with that of the motion. Extensive experiments on US, CMR, and TMR datasets concerning 1270 patients (650 patients for CMR, 500 patients for US and 120 patients for TMR) have shown the effectiveness of our method, as well as the superiority to eight state-of-the-art motion estimation methods.
在心脏图像序列上对左心室心肌的运动估计是评估心功能的关键。然而,心肌图像序列的强度变化给心肌运动估计带来了不确定干扰的挑战。这种与成像相关的不确定干扰出现在不同的心脏成像方式中。我们提出自适应顺序贝叶斯迭代学习来克服这一挑战。具体而言,我们的方法将自适应结构推理应用于状态转换和观察,以应对不确定环境下复杂的心肌运动。在状态转换中,自适应结构推理建立层次结构递归,获得心脏图像序列的复杂潜在表示。在状态观察中,自适应结构推理形成链式结构映射,将心脏图像序列的潜在表征与运动的潜在表征相关联。在1270例患者(650例CMR, 500例US和120例TMR)的US、CMR和TMR数据集上进行的大量实验表明,我们的方法是有效的,并且优于8种最先进的运动估计方法。
{"title":"Adaptive Sequential Bayesian Iterative Learning for Myocardial Motion Estimation on Cardiac Image Sequences","authors":"Shuxin Zhuang;Heye Zhang;Dong Liang;Hui Liu;Zhifan Gao","doi":"10.1109/TMI.2025.3599487","DOIUrl":"10.1109/TMI.2025.3599487","url":null,"abstract":"Motion estimation of left ventricle myocardium on the cardiac image sequence is crucial for assessing cardiac function. However, the intensity variation of cardiac image sequences brings the challenge of uncertain interference to myocardial motion estimation. Such imaging-related uncertain interference appears in different cardiac imaging modalities. We propose adaptive sequential Bayesian iterative learning to overcome the challenge. Specifically, our method applies the adaptive structural inference to state transition and observation to cope with a complex myocardial motion under uncertain setting. In state transition, adaptive structural inference establishes a hierarchical structure recurrence to obtain the complex latent representation of cardiac image sequences. In state observation, the adaptive structural inference forms a chain structure mapping to correlate the latent representation of the cardiac image sequence with that of the motion. Extensive experiments on US, CMR, and TMR datasets concerning 1270 patients (650 patients for CMR, 500 patients for US and 120 patients for TMR) have shown the effectiveness of our method, as well as the superiority to eight state-of-the-art motion estimation methods.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"45 1","pages":"406-420"},"PeriodicalIF":0.0,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144877629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hierarchical Contrastive Learning for Precise Whole-Body Anatomical Localization in PET/CT Imaging 分层对比学习在PET/CT成像中的精确全身解剖定位。
Pub Date : 2025-08-18 DOI: 10.1109/TMI.2025.3599197
Yaozong Gao;Yiran Shu;Mingyang Yu;Yanbo Chen;Jingyu Liu;Shaonan Zhong;Weifang Zhang;Yiqiang Zhan;Xiang Sean Zhou;Xinlu Wang;Meixin Zhao;Dinggang Shen
Automatic anatomical localization is critical for radiology report generation. While many studies focus on lesion detection and segmentation, anatomical localization—accurately describing lesion positions in radiology reports—has received less attention. Conventional segmentation-based methods are limited to organ-level localization and often fail in severe disease cases due to low segmentation accuracy. To address these limitations, we reformulate anatomical localization as an image-to-text retrieval task. Specifically, we propose a CLIP-based framework that aligns lesion image patches with anatomically descriptive text embeddings in a shared multimodal space. By projecting lesion features into the semantic space and retrieving the most relevant anatomical descriptions in a coarse-to-fine manner, our method achieves fine-grained lesion localization with high accuracy across the entire body. Our main contributions are as follows: (1) hierarchical anatomical retrieval, which organizes 387 locations into a two-level hierarchy, by retrieving from the first level of 124 coarse categories to narrow down the search space and reduce localization complexity; (2) augmented location descriptions, which integrate domain-specific anatomical knowledge for enhancing semantic representation and improving visual—text alignment; and (3) semi-hard negative sample mining, which improves training stability and discriminative learning by avoiding selecting the overly similar negative samples that may introduce label noise or semantic ambiguity. We validate our method on two whole-body PET/CT datasets, achieving an 84.13% localization accuracy on the internal test set and 80.42% on the external test set, with a per-lesion inference time of 34 ms. The proposed framework also demonstrated superior robustness in complex clinical cases compared to segmentation-based approaches.
自动解剖定位是生成放射学报告的关键。虽然许多研究都集中在病灶的检测和分割上,但解剖学定位——在放射学报告中准确描述病灶的位置——却很少受到关注。传统的基于分割的方法仅限于器官水平的定位,并且由于分割精度低,在严重的疾病病例中往往失败。为了解决这些限制,我们将解剖定位重新定义为图像到文本的检索任务。具体来说,我们提出了一个基于clip的框架,该框架将病变图像斑块与共享多模态空间中的解剖学描述性文本嵌入对齐。通过将病灶特征投影到语义空间中,并以粗到细的方式检索最相关的解剖描述,我们的方法实现了高精度的全身细粒度病灶定位。我们的主要贡献如下:(1)分层解剖检索,通过从第一级124个粗分类中检索,将387个位置组织成两个层次,缩小了搜索空间,降低了定位复杂度;(2)增强位置描述,整合特定领域的解剖学知识,增强语义表示,改善视觉-文本对齐;(3)半硬负样本挖掘,通过避免选择可能引入标签噪声或语义模糊的过于相似的负样本,提高训练稳定性和判别学习。我们在两个全身PET/CT数据集上验证了我们的方法,在内部测试集上实现了84.13%的定位精度,在外部测试集上实现了80.42%的定位精度,每个病变的推断时间为34 ms。与基于分段的方法相比,所提出的框架在复杂的临床病例中也表现出优越的稳健性。
{"title":"Hierarchical Contrastive Learning for Precise Whole-Body Anatomical Localization in PET/CT Imaging","authors":"Yaozong Gao;Yiran Shu;Mingyang Yu;Yanbo Chen;Jingyu Liu;Shaonan Zhong;Weifang Zhang;Yiqiang Zhan;Xiang Sean Zhou;Xinlu Wang;Meixin Zhao;Dinggang Shen","doi":"10.1109/TMI.2025.3599197","DOIUrl":"10.1109/TMI.2025.3599197","url":null,"abstract":"Automatic anatomical localization is critical for radiology report generation. While many studies focus on lesion detection and segmentation, anatomical localization—accurately describing lesion positions in radiology reports—has received less attention. Conventional segmentation-based methods are limited to organ-level localization and often fail in severe disease cases due to low segmentation accuracy. To address these limitations, we reformulate anatomical localization as an image-to-text retrieval task. Specifically, we propose a CLIP-based framework that aligns lesion image patches with anatomically descriptive text embeddings in a shared multimodal space. By projecting lesion features into the semantic space and retrieving the most relevant anatomical descriptions in a coarse-to-fine manner, our method achieves fine-grained lesion localization with high accuracy across the entire body. Our main contributions are as follows: (1) hierarchical anatomical retrieval, which organizes 387 locations into a two-level hierarchy, by retrieving from the first level of 124 coarse categories to narrow down the search space and reduce localization complexity; (2) augmented location descriptions, which integrate domain-specific anatomical knowledge for enhancing semantic representation and improving visual—text alignment; and (3) semi-hard negative sample mining, which improves training stability and discriminative learning by avoiding selecting the overly similar negative samples that may introduce label noise or semantic ambiguity. We validate our method on two whole-body PET/CT datasets, achieving an 84.13% localization accuracy on the internal test set and 80.42% on the external test set, with a per-lesion inference time of 34 ms. The proposed framework also demonstrated superior robustness in complex clinical cases compared to segmentation-based approaches.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"45 1","pages":"391-405"},"PeriodicalIF":0.0,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144877630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE transactions on medical imaging
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1