首页 > 最新文献

IEEE transactions on medical imaging最新文献

英文 中文
Understanding Brain Functional Dynamics Through Neural Koopman Operator With Control Mechanism 用带控制机制的神经库普曼算子理解脑功能动力学
Pub Date : 2025-06-17 DOI: 10.1109/TMI.2025.3580611
Zhixuan Zhou;Tingting Dan;Guorong Wu
One of the fundamental scientific problems in neuroscience is to have a good understanding of how cognition and behavior emerge from brain function. Since the neuroscience concept of cognitive control parallels the notion of system control in engineering, many computational models formulate the dynamics neural process into a dynamical system, where the hidden states of the complex neural system are modulated by energetic simulations. However, the human brain is a quintessential complex biological system. Current computation models either use neural networks to approximate the underlying dynamics, which makes it difficult to fully understand the system mechanics, or compromise to simplified linear models with very limited power to characterize non-linear and self-organized dynamics along with complex neural activities. To address this challenge, we devise an end-to-end deep model to identify the underlying brain dynamics based on Koopman operator theory, which allows us to model a complex non-linear system in an infinite-dimensional linear space. In the context of reverse engineering, we further propose a biology-inspired control module that adjusts the input (neural activity data) based on feedback to align brain dynamics with the underlying cognitive task. We have applied our deep model to predict cognitive states from a large scale of existing neuroimaging data by identifying the latent dynamic system of functional fluctuations. Promising results demonstrate the potential of establishing a system-level understanding of the intricate relationship between brain function and cognition through the landscape of explainable deep models.
神经科学的一个基本科学问题是要很好地理解认知和行为是如何从大脑功能中产生的。由于认知控制的神经科学概念与工程中的系统控制概念相似,许多计算模型将动态神经过程形成一个动态系统,其中复杂神经系统的隐藏状态通过能量模拟来调节。然而,人类的大脑是一个典型的复杂生物系统。目前的计算模型要么使用神经网络来近似潜在的动力学,这使得很难完全理解系统力学,要么妥协于简化的线性模型,其能力非常有限,无法表征非线性和自组织动力学以及复杂的神经活动。为了应对这一挑战,我们设计了一个基于Koopman算子理论的端到端深度模型来识别潜在的大脑动力学,这使我们能够在无限维线性空间中建模复杂的非线性系统。在逆向工程的背景下,我们进一步提出了一个受生物学启发的控制模块,该模块根据反馈调整输入(神经活动数据),以使大脑动态与潜在的认知任务保持一致。我们通过识别功能波动的潜在动态系统,将我们的深度模型应用于从大量现有的神经成像数据中预测认知状态。有希望的结果表明,通过可解释的深度模型的景观,建立对大脑功能和认知之间复杂关系的系统级理解的潜力。
{"title":"Understanding Brain Functional Dynamics Through Neural Koopman Operator With Control Mechanism","authors":"Zhixuan Zhou;Tingting Dan;Guorong Wu","doi":"10.1109/TMI.2025.3580611","DOIUrl":"10.1109/TMI.2025.3580611","url":null,"abstract":"One of the fundamental scientific problems in neuroscience is to have a good understanding of how cognition and behavior emerge from brain function. Since the neuroscience concept of cognitive control parallels the notion of system control in engineering, many computational models formulate the dynamics neural process into a dynamical system, where the hidden states of the complex neural system are modulated by energetic simulations. However, the human brain is a quintessential complex biological system. Current computation models either use neural networks to approximate the underlying dynamics, which makes it difficult to fully understand the system mechanics, or compromise to simplified linear models with very limited power to characterize non-linear and self-organized dynamics along with complex neural activities. To address this challenge, we devise an end-to-end deep model to identify the underlying brain dynamics based on Koopman operator theory, which allows us to model a complex non-linear system in an infinite-dimensional linear space. In the context of reverse engineering, we further propose a biology-inspired control module that adjusts the input (neural activity data) based on feedback to align brain dynamics with the underlying cognitive task. We have applied our deep model to predict cognitive states from a large scale of existing neuroimaging data by identifying the latent dynamic system of functional fluctuations. Promising results demonstrate the potential of establishing a system-level understanding of the intricate relationship between brain function and cognition through the landscape of explainable deep models.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 11","pages":"4627-4638"},"PeriodicalIF":0.0,"publicationDate":"2025-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144311305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Teacher Temporal Regulation Network for Surgical Workflow Recognition 外科工作流程识别的多教师时间调节网络
Pub Date : 2025-06-16 DOI: 10.1109/TMI.2025.3580082
Kaide Huang;Xiang-Lei Yuan;Rui-De Liu;Lian-Song Ye;Yao Zhou;Bing Hu;Zhang Yi
Automatic recognition of surgical workflow plays a vital role in modern operating rooms. Given the complex nature and extended duration of surgical videos, accurate recognition of surgical workflow is highly challenging. Despite being widely studied, existing methods still face two major limitations: insufficient visual feature extraction and performance degradation caused by inconsistency between training and testing features. To address these limitations, this paper proposes a Multi-Teacher Temporal Regulation Network (MTTR-Net) for surgical workflow recognition. To extract discriminative visual features, we introduce a “sequence of clips” training strategy. This strategy employs a set of sparsely sampled video clips as input to train the feature encoder and incorporates an auxiliary temporal regularizer to model long-range temporal dependencies across these clips, ensuring the feature encoder captures critical information from each frame. Then, to mitigate the inconsistency between training and testing features, we further develop a cross-mimicking strategy that iteratively trains multiple feature encoders on different data subsets to generate consistent mimicked features. A temporal encoder is trained on these mimicked features to achieve stable performance during testing. Extensive experiments on eight public surgical video datasets demonstrate that our MTTR-Net outperforms state-of-the-art methods across various metrics. Our code has been released at https://github.com/kaideH/MGTR-Net
手术流程的自动识别在现代手术室中起着至关重要的作用。鉴于手术视频的复杂性和持续时间的延长,准确识别手术工作流程是极具挑战性的。尽管已有的方法得到了广泛的研究,但仍然存在两大局限性:视觉特征提取不足和训练特征与测试特征不一致导致的性能下降。为了解决这些限制,本文提出了一个多教师时间调节网络(mtrnet)用于外科工作流程识别。为了提取判别性的视觉特征,我们引入了“片段序列”训练策略。该策略采用一组稀疏采样的视频片段作为输入来训练特征编码器,并结合一个辅助的时间正则化器来模拟这些片段之间的长期时间依赖性,确保特征编码器从每一帧捕获关键信息。然后,为了减轻训练和测试特征之间的不一致性,我们进一步开发了一种交叉模拟策略,该策略在不同的数据子集上迭代训练多个特征编码器以生成一致的模拟特征。在这些模拟特征上训练时序编码器,以在测试过程中获得稳定的性能。在8个公共手术视频数据集上进行的广泛实验表明,我们的mtr - net在各种指标上都优于最先进的方法。我们的代码已在https://github.com/kaideH/MGTR-Net上发布
{"title":"Multi-Teacher Temporal Regulation Network for Surgical Workflow Recognition","authors":"Kaide Huang;Xiang-Lei Yuan;Rui-De Liu;Lian-Song Ye;Yao Zhou;Bing Hu;Zhang Yi","doi":"10.1109/TMI.2025.3580082","DOIUrl":"10.1109/TMI.2025.3580082","url":null,"abstract":"Automatic recognition of surgical workflow plays a vital role in modern operating rooms. Given the complex nature and extended duration of surgical videos, accurate recognition of surgical workflow is highly challenging. Despite being widely studied, existing methods still face two major limitations: insufficient visual feature extraction and performance degradation caused by inconsistency between training and testing features. To address these limitations, this paper proposes a Multi-Teacher Temporal Regulation Network (MTTR-Net) for surgical workflow recognition. To extract discriminative visual features, we introduce a “sequence of clips” training strategy. This strategy employs a set of sparsely sampled video clips as input to train the feature encoder and incorporates an auxiliary temporal regularizer to model long-range temporal dependencies across these clips, ensuring the feature encoder captures critical information from each frame. Then, to mitigate the inconsistency between training and testing features, we further develop a cross-mimicking strategy that iteratively trains multiple feature encoders on different data subsets to generate consistent mimicked features. A temporal encoder is trained on these mimicked features to achieve stable performance during testing. Extensive experiments on eight public surgical video datasets demonstrate that our MTTR-Net outperforms state-of-the-art methods across various metrics. Our code has been released at <uri>https://github.com/kaideH/MGTR-Net</uri>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 11","pages":"4690-4703"},"PeriodicalIF":0.0,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144304753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing Free-Hand 3-D Photoacoustic and Ultrasound Reconstruction Using Deep Learning 利用深度学习增强徒手三维光声和超声重建
Pub Date : 2025-06-13 DOI: 10.1109/TMI.2025.3579454
SiYeoul Lee;Seonho Kim;MinKyung Seo;SeongKyu Park;Salehin Imrus;Kambaluru Ashok;DongEon Lee;Chunsu Park;SeonYeong Lee;Jiye Kim;Jae-Heung Yoo;MinWoo Kim
This study introduces a motion-based learning network with a global-local self-attention module (MoGLo-Net) to enhance 3D reconstruction in handheld photoacoustic and ultrasound (PAUS) imaging. Standard PAUS imaging is often limited by a narrow field of view (FoV) and the inability to effectively visualize complex 3D structures. The 3D freehand technique, which aligns sequential 2D images for 3D reconstruction, faces significant challenges in accurate motion estimation without relying on external positional sensors. MoGLo-Net addresses these limitations through an innovative adaptation of the self-attention mechanism, which effectively exploits the critical regions, such as fully-developed speckle areas or high-echogenic tissue regions within successive ultrasound images to accurately estimate the motion parameters. This facilitates the extraction of intricate features from individual frames. Additionally, we employ a patch-wise correlation operation to generate a correlation volume that is highly correlated with the scanning motion. A custom loss function was also developed to ensure robust learning with minimized bias, leveraging the characteristics of the motion parameters. Experimental evaluations demonstrated that MoGLo-Net surpasses current state-of-the-art methods in both quantitative and qualitative performance metrics. Furthermore, we expanded the application of 3D reconstruction technology beyond simple B-mode ultrasound volumes to incorporate Doppler ultrasound and photoacoustic imaging, enabling 3D visualization of vasculature. The source code for this study is publicly available at: https://github.com/pnu-amilab/US3D
本研究引入了一种带有全局-局部自注意模块(MoGLo-Net)的基于运动的学习网络,以增强手持式光声和超声(PAUS)成像的3D重建。标准PAUS成像通常受到狭窄视场(FoV)和无法有效地可视化复杂3D结构的限制。3D徒手技术将连续的2D图像对齐以进行3D重建,在不依赖外部位置传感器的情况下进行准确的运动估计面临重大挑战。MoGLo-Net通过对自我注意机制的创新适应解决了这些限制,该机制有效地利用了关键区域,如连续超声图像中完全发育的斑点区域或高回声组织区域,以准确估计运动参数。这有助于从单个帧中提取复杂的特征。此外,我们采用逐块相关操作来生成与扫描运动高度相关的相关体积。还开发了一个自定义损失函数,以确保鲁棒学习最小化偏差,利用运动参数的特性。实验评估表明,mogloo - net在定量和定性性能指标方面都超过了目前最先进的方法。此外,我们将3D重建技术的应用范围从简单的b超扩展到多普勒超声和光声成像,从而实现血管系统的3D可视化。这项研究的源代码可以在https://github.com/pnu-amilab/US3D上公开获得
{"title":"Enhancing Free-Hand 3-D Photoacoustic and Ultrasound Reconstruction Using Deep Learning","authors":"SiYeoul Lee;Seonho Kim;MinKyung Seo;SeongKyu Park;Salehin Imrus;Kambaluru Ashok;DongEon Lee;Chunsu Park;SeonYeong Lee;Jiye Kim;Jae-Heung Yoo;MinWoo Kim","doi":"10.1109/TMI.2025.3579454","DOIUrl":"10.1109/TMI.2025.3579454","url":null,"abstract":"This study introduces a motion-based learning network with a global-local self-attention module (MoGLo-Net) to enhance 3D reconstruction in handheld photoacoustic and ultrasound (PAUS) imaging. Standard PAUS imaging is often limited by a narrow field of view (FoV) and the inability to effectively visualize complex 3D structures. The 3D freehand technique, which aligns sequential 2D images for 3D reconstruction, faces significant challenges in accurate motion estimation without relying on external positional sensors. MoGLo-Net addresses these limitations through an innovative adaptation of the self-attention mechanism, which effectively exploits the critical regions, such as fully-developed speckle areas or high-echogenic tissue regions within successive ultrasound images to accurately estimate the motion parameters. This facilitates the extraction of intricate features from individual frames. Additionally, we employ a patch-wise correlation operation to generate a correlation volume that is highly correlated with the scanning motion. A custom loss function was also developed to ensure robust learning with minimized bias, leveraging the characteristics of the motion parameters. Experimental evaluations demonstrated that MoGLo-Net surpasses current state-of-the-art methods in both quantitative and qualitative performance metrics. Furthermore, we expanded the application of 3D reconstruction technology beyond simple B-mode ultrasound volumes to incorporate Doppler ultrasound and photoacoustic imaging, enabling 3D visualization of vasculature. The source code for this study is publicly available at: <uri>https://github.com/pnu-amilab/US3D</uri>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 11","pages":"4652-4665"},"PeriodicalIF":0.0,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11036110","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144288376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
OneTouch Automated Photoacoustic and Ultrasound Imaging of Breast in Standing Pose OneTouch自动光声和超声成像乳房站立姿势
Pub Date : 2025-06-12 DOI: 10.1109/TMI.2025.3578929
Huijuan Zhang;Emily Zheng;Wenhan Zheng;Chuqin Huang;Yunqi Xi;Yanda Cheng;Shuliang Yu;Saptarshi Chakraborty;Ermelinda Bonaccio;Kazuaki Takabe;Xinhao C. Fan;Wenyao Xu;Jun Xia
We developed an automated photoacoustic and ultrasound breast tomography system that images the patient in the standing pose. The system, named OneTouch-PAT, utilized linear transducer arrays with optical-acoustic combiners for effective dual-modal imaging. During scanning, subjects only need to gently attach their breasts to the imaging window, and co-registered three-dimensional ultrasonic and photoacoustic images of the breast can be obtained within one minute. Our system has a large field of view of 17 cm by 15 cm and achieves an imaging depth of 3 cm with sub-millimeter resolution. A three-dimensional deep-learning network was also developed to further improve the image quality by improving the 3D resolution, enhancing vasculature, eliminating skin signals, and reducing noise. The performance of the system was tested on four healthy subjects and 61 patients with breast cancer. Our results indicate that the ultrasound structural information can be combined with the photoacoustic vascular information for better tissue characterization. Representative cases from different molecular subtypes have indicated different photoacoustic and ultrasound features that could potentially be used for imaging-based cancer classification. Statistical analysis among all patients indicates that the regional photoacoustic intensity and vessel branching points are indicators of breast malignancy. These promising results suggest that our system could significantly enhance breast cancer diagnosis and classification.
我们开发了一种自动光声和超声乳房断层扫描系统,可以对站立姿势的患者进行成像。该系统名为OneTouch-PAT,利用线性换能器阵列和光声合成器进行有效的双峰成像。在扫描过程中,受试者只需将乳房轻轻贴在成像窗口上,即可在1分钟内获得乳房的三维超声和光声图像。我们的系统具有17厘米乘15厘米的大视场,成像深度为3厘米,分辨率为亚毫米。三维深度学习网络也被开发出来,通过提高三维分辨率、增强血管、消除皮肤信号和降低噪声来进一步提高图像质量。该系统的性能在4名健康受试者和61名乳腺癌患者身上进行了测试。我们的研究结果表明,超声结构信息可以与光声血管信息相结合,以更好地表征组织。来自不同分子亚型的代表性病例显示出不同的光声和超声特征,这些特征可能用于基于成像的癌症分类。对所有患者的统计分析表明,局部光声强度和血管分支点是乳腺恶性肿瘤的指标。这些有希望的结果表明,我们的系统可以显著提高乳腺癌的诊断和分类。
{"title":"OneTouch Automated Photoacoustic and Ultrasound Imaging of Breast in Standing Pose","authors":"Huijuan Zhang;Emily Zheng;Wenhan Zheng;Chuqin Huang;Yunqi Xi;Yanda Cheng;Shuliang Yu;Saptarshi Chakraborty;Ermelinda Bonaccio;Kazuaki Takabe;Xinhao C. Fan;Wenyao Xu;Jun Xia","doi":"10.1109/TMI.2025.3578929","DOIUrl":"10.1109/TMI.2025.3578929","url":null,"abstract":"We developed an automated photoacoustic and ultrasound breast tomography system that images the patient in the standing pose. The system, named OneTouch-PAT, utilized linear transducer arrays with optical-acoustic combiners for effective dual-modal imaging. During scanning, subjects only need to gently attach their breasts to the imaging window, and co-registered three-dimensional ultrasonic and photoacoustic images of the breast can be obtained within one minute. Our system has a large field of view of 17 cm by 15 cm and achieves an imaging depth of 3 cm with sub-millimeter resolution. A three-dimensional deep-learning network was also developed to further improve the image quality by improving the 3D resolution, enhancing vasculature, eliminating skin signals, and reducing noise. The performance of the system was tested on four healthy subjects and 61 patients with breast cancer. Our results indicate that the ultrasound structural information can be combined with the photoacoustic vascular information for better tissue characterization. Representative cases from different molecular subtypes have indicated different photoacoustic and ultrasound features that could potentially be used for imaging-based cancer classification. Statistical analysis among all patients indicates that the regional photoacoustic intensity and vessel branching points are indicators of breast malignancy. These promising results suggest that our system could significantly enhance breast cancer diagnosis and classification.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 11","pages":"4617-4626"},"PeriodicalIF":0.0,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144278226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tackling Tumor Heterogeneity Issue: Transformer-Based Multiple Instance Enhancement Learning for Predicting EGFR Mutation via CT Images 解决肿瘤异质性问题:基于转换器的多实例增强学习,通过CT图像预测EGFR突变
Pub Date : 2025-06-12 DOI: 10.1109/TMI.2025.3578995
Yulin Fang;Minghui Wang;Qilong Song;Chi Cao;Ziyu Gao;Biao Song;Xuhong Min;Ao Li
Accurate and non-invasive prediction of epidermal growth factor receptor (EGFR) mutation is crucial for the diagnosis and treatment of non-small cell lung cancer (NSCLC). While computed tomography (CT) imaging shows promise in identifying EGFR mutation, current prediction methods heavily rely on fully supervised learning, which overlooks the substantial heterogeneity of tumors and therefore leads to suboptimal results. To tackle tumor heterogeneity issue, this study introduces a novel weakly supervised method named TransMIEL, which leverages multiple instance learning techniques for accurate EGFR mutation prediction. Specifically, we first propose an innovative instance enhancement learning (IEL) strategy that strengthens the discriminative power of instance features for complex tumor CT images by exploring self-derived soft pseudo-labels. Next, to improve tumor representation capability, we design a spatial-aware transformer (SAT) that fully captures inter-instance relationships of different pathological subregions to mirror the diagnostic processes of radiologists. Finally, an instance adaptive gating (IAG) module is developed to effectively emphasize the contribution of informative instance features in heterogeneous tumors, facilitating dynamic instance feature aggregation and increasing model generalization performance. Experimental results demonstrate that TransMIEL significantly outperforms existing fully and weakly supervised methods on both public and in-house NSCLC datasets. Additionally, visualization results show that our approach can highlight intra-tumor and peri-tumor areas relevant to EGFR mutation status. Therefore, our method holds significant potential as an effective tool for EGFR prediction and offers a novel perspective for future research on tumor heterogeneity.
准确、无创地预测表皮生长因子受体(EGFR)突变对于非小细胞肺癌(NSCLC)的诊断和治疗至关重要。虽然计算机断层扫描(CT)成像显示出识别EGFR突变的希望,但目前的预测方法严重依赖于完全监督学习,这忽略了肿瘤的实质性异质性,因此导致次优结果。为了解决肿瘤异质性问题,本研究引入了一种名为TransMIEL的新型弱监督方法,该方法利用多实例学习技术精确预测EGFR突变。具体而言,我们首先提出了一种创新的实例增强学习(IEL)策略,该策略通过探索自衍生的软伪标签来增强实例特征对复杂肿瘤CT图像的判别能力。接下来,为了提高肿瘤表征能力,我们设计了一个空间感知转换器(SAT),它可以完全捕获不同病理亚区域的实例间关系,以反映放射科医生的诊断过程。最后,开发了实例自适应门控(IAG)模块,有效地强调了异构肿瘤中信息实例特征的贡献,促进了实例特征的动态聚合,提高了模型的泛化性能。实验结果表明,TransMIEL在公共和内部NSCLC数据集上都明显优于现有的全监督和弱监督方法。此外,可视化结果表明,我们的方法可以突出与EGFR突变状态相关的肿瘤内和肿瘤周围区域。因此,我们的方法具有作为EGFR预测有效工具的巨大潜力,并为未来肿瘤异质性的研究提供了新的视角。
{"title":"Tackling Tumor Heterogeneity Issue: Transformer-Based Multiple Instance Enhancement Learning for Predicting EGFR Mutation via CT Images","authors":"Yulin Fang;Minghui Wang;Qilong Song;Chi Cao;Ziyu Gao;Biao Song;Xuhong Min;Ao Li","doi":"10.1109/TMI.2025.3578995","DOIUrl":"10.1109/TMI.2025.3578995","url":null,"abstract":"Accurate and non-invasive prediction of epidermal growth factor receptor (EGFR) mutation is crucial for the diagnosis and treatment of non-small cell lung cancer (NSCLC). While computed tomography (CT) imaging shows promise in identifying EGFR mutation, current prediction methods heavily rely on fully supervised learning, which overlooks the substantial heterogeneity of tumors and therefore leads to suboptimal results. To tackle tumor heterogeneity issue, this study introduces a novel weakly supervised method named TransMIEL, which leverages multiple instance learning techniques for accurate EGFR mutation prediction. Specifically, we first propose an innovative instance enhancement learning (IEL) strategy that strengthens the discriminative power of instance features for complex tumor CT images by exploring self-derived soft pseudo-labels. Next, to improve tumor representation capability, we design a spatial-aware transformer (SAT) that fully captures inter-instance relationships of different pathological subregions to mirror the diagnostic processes of radiologists. Finally, an instance adaptive gating (IAG) module is developed to effectively emphasize the contribution of informative instance features in heterogeneous tumors, facilitating dynamic instance feature aggregation and increasing model generalization performance. Experimental results demonstrate that TransMIEL significantly outperforms existing fully and weakly supervised methods on both public and in-house NSCLC datasets. Additionally, visualization results show that our approach can highlight intra-tumor and peri-tumor areas relevant to EGFR mutation status. Therefore, our method holds significant potential as an effective tool for EGFR prediction and offers a novel perspective for future research on tumor heterogeneity.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 11","pages":"4524-4535"},"PeriodicalIF":0.0,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144278251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rethinking Brain Tumor Segmentation From the Frequency Domain Perspective 从频域角度重新思考脑肿瘤分割
Pub Date : 2025-06-12 DOI: 10.1109/TMI.2025.3579213
Minye Shao;Zeyu Wang;Haoran Duan;Yawen Huang;Bing Zhai;Shizheng Wang;Yang Long;Yefeng Zheng
Precise segmentation of brain tumors, particularly contrast-enhancing regions visible in post-contrast MRI (areas highlighted by contrast agent injection), is crucial for accurate clinical diagnosis and treatment planning but remains challenging. However, current methods exhibit notable performance degradation in segmenting these enhancing brain tumor areas, largely due to insufficient consideration of MRI-specific tumor features such as complex textures and directional variations. To address this, we propose the Harmonized Frequency Fusion Network (HFF-Net), which rethinks brain tumor segmentation from a frequency-domain perspective. To comprehensively characterize tumor regions, we develop a Frequency Domain Decomposition (FDD) module that separates MRI images into low-frequency components, capturing smooth tumor contours and high-frequency components, highlighting detailed textures and directional edges. To further enhance sensitivity to tumor boundaries, we introduce an Adaptive Laplacian Convolution (ALC) module that adaptively emphasizes critical high-frequency details using dynamically updated convolution kernels. To effectively fuse tumor features across multiple scales, we design a Frequency Domain Cross-Attention (FDCA) integrating semantic, positional, and slice-specific information. We further validate and interpret frequency-domain improvements through visualization, theoretical reasoning, and experimental analyses. Extensive experiments on four public datasets demonstrate that HFF-Net achieves an average relative improvement of 4.48% (ranging from 2.39% to 7.72%) in the mean Dice scores across the three major subregions, and an average relative improvement of 7.33% (ranging from 5.96% to 8.64%) in the segmentation of contrast-enhancing tumor regions, while maintaining favorable computational efficiency and clinical applicability. Our code is available at: https://github.com/VinyehShaw/HFF
脑肿瘤的精确分割,特别是在造影后MRI上可见的对比增强区域(注射造影剂突出的区域),对于准确的临床诊断和治疗计划至关重要,但仍然具有挑战性。然而,目前的方法在分割这些增强的脑肿瘤区域时表现出明显的性能下降,这主要是由于没有充分考虑mri特异性肿瘤特征,如复杂的纹理和方向变化。为了解决这个问题,我们提出了调和频率融合网络(HFF-Net),从频域的角度重新思考脑肿瘤的分割。为了全面表征肿瘤区域,我们开发了一个频域分解(FDD)模块,该模块将MRI图像分离为低频分量,捕获平滑的肿瘤轮廓和高频分量,突出显示详细的纹理和方向边缘。为了进一步提高对肿瘤边界的敏感性,我们引入了一个自适应拉普拉斯卷积(ALC)模块,该模块使用动态更新的卷积核自适应地强调关键的高频细节。为了有效地融合多个尺度上的肿瘤特征,我们设计了一个频域交叉注意(FDCA)集成语义、位置和切片特定信息。我们通过可视化、理论推理和实验分析进一步验证和解释频域改进。在4个公开数据集上进行的大量实验表明,HFF-Net在三个主要子区域的平均Dice评分上平均相对提高了4.48%(2.39% ~ 7.72%),在对增强肿瘤区域的分割上平均相对提高了7.33%(5.96% ~ 8.64%),同时保持了良好的计算效率和临床适用性。我们的代码可在:https://github.com/VinyehShaw/HFF
{"title":"Rethinking Brain Tumor Segmentation From the Frequency Domain Perspective","authors":"Minye Shao;Zeyu Wang;Haoran Duan;Yawen Huang;Bing Zhai;Shizheng Wang;Yang Long;Yefeng Zheng","doi":"10.1109/TMI.2025.3579213","DOIUrl":"10.1109/TMI.2025.3579213","url":null,"abstract":"Precise segmentation of brain tumors, particularly contrast-enhancing regions visible in post-contrast MRI (areas highlighted by contrast agent injection), is crucial for accurate clinical diagnosis and treatment planning but remains challenging. However, current methods exhibit notable performance degradation in segmenting these enhancing brain tumor areas, largely due to insufficient consideration of MRI-specific tumor features such as complex textures and directional variations. To address this, we propose the Harmonized Frequency Fusion Network (HFF-Net), which rethinks brain tumor segmentation from a frequency-domain perspective. To comprehensively characterize tumor regions, we develop a Frequency Domain Decomposition (FDD) module that separates MRI images into low-frequency components, capturing smooth tumor contours and high-frequency components, highlighting detailed textures and directional edges. To further enhance sensitivity to tumor boundaries, we introduce an Adaptive Laplacian Convolution (ALC) module that adaptively emphasizes critical high-frequency details using dynamically updated convolution kernels. To effectively fuse tumor features across multiple scales, we design a Frequency Domain Cross-Attention (FDCA) integrating semantic, positional, and slice-specific information. We further validate and interpret frequency-domain improvements through visualization, theoretical reasoning, and experimental analyses. Extensive experiments on four public datasets demonstrate that HFF-Net achieves an average relative improvement of 4.48% (ranging from 2.39% to 7.72%) in the mean Dice scores across the three major subregions, and an average relative improvement of 7.33% (ranging from 5.96% to 8.64%) in the segmentation of contrast-enhancing tumor regions, while maintaining favorable computational efficiency and clinical applicability. Our code is available at: <uri>https://github.com/VinyehShaw/HFF</uri>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 11","pages":"4536-4553"},"PeriodicalIF":0.0,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144278250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Discovering Differential Imaging Genetic Modules via Multimodal Fusion-Based Hypergraph Transductive Learning in Alzheimer’s Disease Diagnosis 通过基于多模态融合的超图转导学习在阿尔茨海默病诊断中发现差异成像遗传模块
Pub Date : 2025-06-11 DOI: 10.1109/TMI.2025.3578601
Meiling Wang;Liang Sun;Wei Shao;Daoqiang Zhang
Brain imaging genetics is a widely focused topic, which has achieved the great successes in the diagnosis of complex brain disorders. In clinical practice, most existing data fusion approaches extract features from homogeneous data, neglecting the heterogeneous structural information among imaging genetic data. In addition, the number of labeled samples is limited due to the cost and time of manually labeling data. To remedy such deficiencies, in this work, we present a multimodal fusion-based hypergraph transductive learning (MFHT) for clinical diagnosis. Specifically, for each modality, we first construct a corresponding similarity graph to reflect the similarity between subjects using the label prior. Then, the multiple graph fusion approach based on theoretical convergence guarantee is designed for learning a unified graph harnessing the structure of entire data. Finally, to fully exploit the rich information of the obtained graph, a hypergraph transductive learning approach is designed to effectively capture the complex structures and high-order relationships in both labeled and unlabeled data to achieve the diagnosis results. The brain imaging genetic data of the Alzheimer’s Disease Neuroimaging Initiative (ADNI) datasets are used to experimentally explore our developed method. Related results show that our method is well applied to the analysis of brain imaging genetic data, which accounts for genetics, brain imaging (region of interest (ROI) node features), and brain imaging (connectivity edge features) to boost the understanding of disease mechanism as well as improve clinical diagnosis.
脑成像遗传学是一个广泛关注的课题,在复杂脑疾病的诊断方面取得了巨大的成功。在临床实践中,现有的数据融合方法大多是从同质数据中提取特征,而忽略了成像遗传数据之间的异质结构信息。此外,由于人工标记数据的成本和时间,标记样本的数量有限。为了弥补这些缺陷,在这项工作中,我们提出了一种基于多模态融合的超图转导学习(MFHT)用于临床诊断。具体来说,对于每个模态,我们首先使用标签先验构建相应的相似度图来反映主题之间的相似度。然后,设计了基于理论收敛保证的多图融合方法,利用整个数据的结构学习统一的图;最后,为了充分利用得到的图的丰富信息,设计了一种超图转导学习方法,有效捕获标记和未标记数据中的复杂结构和高阶关系,从而获得诊断结果。使用阿尔茨海默病神经影像学倡议(ADNI)数据集的脑成像遗传数据对我们开发的方法进行实验探索。相关结果表明,我们的方法可以很好地应用于脑成像遗传数据的分析,包括遗传学、脑成像(感兴趣区域(ROI)节点特征)和脑成像(连接边缘特征),以促进对疾病机制的理解,提高临床诊断水平。
{"title":"Discovering Differential Imaging Genetic Modules via Multimodal Fusion-Based Hypergraph Transductive Learning in Alzheimer’s Disease Diagnosis","authors":"Meiling Wang;Liang Sun;Wei Shao;Daoqiang Zhang","doi":"10.1109/TMI.2025.3578601","DOIUrl":"10.1109/TMI.2025.3578601","url":null,"abstract":"Brain imaging genetics is a widely focused topic, which has achieved the great successes in the diagnosis of complex brain disorders. In clinical practice, most existing data fusion approaches extract features from homogeneous data, neglecting the heterogeneous structural information among imaging genetic data. In addition, the number of labeled samples is limited due to the cost and time of manually labeling data. To remedy such deficiencies, in this work, we present a multimodal fusion-based hypergraph transductive learning (MFHT) for clinical diagnosis. Specifically, for each modality, we first construct a corresponding similarity graph to reflect the similarity between subjects using the label prior. Then, the multiple graph fusion approach based on theoretical convergence guarantee is designed for learning a unified graph harnessing the structure of entire data. Finally, to fully exploit the rich information of the obtained graph, a hypergraph transductive learning approach is designed to effectively capture the complex structures and high-order relationships in both labeled and unlabeled data to achieve the diagnosis results. The brain imaging genetic data of the Alzheimer’s Disease Neuroimaging Initiative (ADNI) datasets are used to experimentally explore our developed method. Related results show that our method is well applied to the analysis of brain imaging genetic data, which accounts for genetics, brain imaging (region of interest (ROI) node features), and brain imaging (connectivity edge features) to boost the understanding of disease mechanism as well as improve clinical diagnosis.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 11","pages":"4592-4604"},"PeriodicalIF":0.0,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144268537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Polarity Prompting Vision Foundation Models for Pathology Image Analysis 病理学图像分析的极性提示视觉基础模型
Pub Date : 2025-06-10 DOI: 10.1109/TMI.2025.3578492
Chong Yin;Siqi Liu;Kaiyang Zhou;Vincent Wai-Sun Wong;Pong C. Yuen
The sharp rise in non-alcoholic fatty liver disease (NAFLD) cases has become a major health concern in recent years. Accurately identifying tissue alteration regions is crucial for NAFLD diagnosis but challenging with small-scale pathology datasets. Recently, prompt tuning has emerged as an effective strategy for adapting vision models to small-scale data analysis. However, current prompting techniques, designed primarily for general image classification, use generic cues that are inadequate when dealing with the intricacies of pathological tissue analysis. To solve this problem, we introduce Quantitative Attribute-based Polarity Visual Prompting (Q-PoVP), a new prompting method for pathology image analysis. Q-PoVP introduces two types of measurable attributes: K-function-based spatial attributes and histogram-based morphological attributes. Both help to measure tissue conditions quantitatively. We develop a quantitative attribute-based polarity visual prompt generator that converts quantitative visual attributes into positive and negative visual prompts, facilitating a more comprehensive and nuanced interpretation of pathological images. To enhance feature discrimination, we introduce a novel orthogonal-based polarity visual prompt tuning technique that disentangles and amplifies positive visual attributes while suppressing negative ones. We extensively tested our method on three different tasks. Our task-specific prompting demonstrates superior performance in both diagnostic accuracy and interpretability compared to existing methods. This dual advantage makes it particularly valuable for clinical settings, where healthcare providers require not only reliable results but also transparent reasoning to support informed patient care decisions. Code is available at https://github.com/7LFB/Q-PoVP
近年来,非酒精性脂肪性肝病(NAFLD)病例的急剧上升已成为一个主要的健康问题。准确识别组织改变区域是NAFLD诊断的关键,但具有挑战性的小规模病理数据集。最近,快速调优已经成为一种使视觉模型适应小规模数据分析的有效策略。然而,目前的提示技术主要是为一般图像分类设计的,在处理复杂的病理组织分析时,使用通用的提示是不够的。为了解决这一问题,我们引入了一种新的病理图像分析提示方法——基于定量属性的极性视觉提示(Q-PoVP)。Q-PoVP引入了两种可测量属性:基于k函数的空间属性和基于直方图的形态属性。两者都有助于定量测量组织状况。我们开发了一个基于定量属性的极性视觉提示生成器,将定量视觉属性转换为积极和消极的视觉提示,促进对病理图像的更全面和细致的解释。为了增强特征识别,我们引入了一种新的基于正交的极性视觉提示调谐技术,该技术可以在抑制负面视觉属性的同时,将积极的视觉属性分解和放大。我们在三个不同的任务上广泛测试了我们的方法。与现有方法相比,我们的任务特定提示在诊断准确性和可解释性方面都表现出优越的性能。这种双重优势使其在临床环境中特别有价值,因为医疗保健提供者不仅需要可靠的结果,还需要透明的推理来支持知情的患者护理决策。代码可从https://github.com/7LFB/Q-PoVP获得
{"title":"Polarity Prompting Vision Foundation Models for Pathology Image Analysis","authors":"Chong Yin;Siqi Liu;Kaiyang Zhou;Vincent Wai-Sun Wong;Pong C. Yuen","doi":"10.1109/TMI.2025.3578492","DOIUrl":"10.1109/TMI.2025.3578492","url":null,"abstract":"The sharp rise in non-alcoholic fatty liver disease (NAFLD) cases has become a major health concern in recent years. Accurately identifying tissue alteration regions is crucial for NAFLD diagnosis but challenging with small-scale pathology datasets. Recently, prompt tuning has emerged as an effective strategy for adapting vision models to small-scale data analysis. However, current prompting techniques, designed primarily for general image classification, use generic cues that are inadequate when dealing with the intricacies of pathological tissue analysis. To solve this problem, we introduce Quantitative Attribute-based Polarity Visual Prompting (Q-PoVP), a new prompting method for pathology image analysis. Q-PoVP introduces two types of measurable attributes: K-function-based spatial attributes and histogram-based morphological attributes. Both help to measure tissue conditions quantitatively. We develop a quantitative attribute-based polarity visual prompt generator that converts quantitative visual attributes into positive and negative visual prompts, facilitating a more comprehensive and nuanced interpretation of pathological images. To enhance feature discrimination, we introduce a novel orthogonal-based polarity visual prompt tuning technique that disentangles and amplifies positive visual attributes while suppressing negative ones. We extensively tested our method on three different tasks. Our task-specific prompting demonstrates superior performance in both diagnostic accuracy and interpretability compared to existing methods. This dual advantage makes it particularly valuable for clinical settings, where healthcare providers require not only reliable results but also transparent reasoning to support informed patient care decisions. Code is available at <uri>https://github.com/7LFB/Q-PoVP</uri>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 11","pages":"4579-4591"},"PeriodicalIF":0.0,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144260086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pediatric Corpulence Assessment Using Ultra-Wideband Radar Imaging System: A Novel Approach in Tissue Characterization 使用超宽带雷达成像系统评估儿童肥胖:一种组织表征的新方法
Pub Date : 2025-06-10 DOI: 10.1109/TMI.2025.3578283
Kapil Gangwar;Fatemeh Modares Sabzevari;Karumudi Rambabu
This article proposes an ex-vivo method to estimate the dielectric properties and thickness of adipose tissue in the human body. Based on the electrical properties of adipose tissue, obesity levels will be assessed. This approach consists of two steps: 1) data acquisition by an ultrawideband (UWB) time-domain radar and 2) genetic algorithm optimization of the intended goal function. This study considers a three-layered tissue model to mimic the surface of the human abdomen. The experimental phantom consists of a pork skin layer followed by pork fat, then ground pork to emulate the muscle tissue. An aperture with a diameter of 2 cm on a metal sheet focuses the measurements on a small area of interest. The measured results were compared with the actual permittivity and thickness of different layers of the experimental phantom. The technique is also applied to human voxel tissue models available in the CST software library, including babies, children, and adults. The accuracy of measurement data confirms the suitability of this technique. This technique is a noninvasive, safe, cost-effective method to determine the type of fat tissue in the human body and the level of obesity.
本文提出了一种估算人体脂肪组织介电特性和厚度的离体方法。根据脂肪组织的电特性,将评估肥胖水平。该方法包括两个步骤:1)超宽带(UWB)时域雷达数据采集和2)遗传算法优化预期目标函数。本研究采用三层组织模型来模拟人体腹部表面。实验模型由猪皮层和猪油层组成,然后是猪肉粉来模拟肌肉组织。金属板上直径为2厘米的孔径将测量集中在感兴趣的小区域上。将测量结果与实际介电常数和实验模体各层厚度进行了比较。该技术也适用于CST软件库中可用的人体体素组织模型,包括婴儿,儿童和成人。测量数据的准确性证实了该技术的适用性。这项技术是一种无创、安全、经济的方法,可以确定人体脂肪组织的类型和肥胖程度。
{"title":"Pediatric Corpulence Assessment Using Ultra-Wideband Radar Imaging System: A Novel Approach in Tissue Characterization","authors":"Kapil Gangwar;Fatemeh Modares Sabzevari;Karumudi Rambabu","doi":"10.1109/TMI.2025.3578283","DOIUrl":"10.1109/TMI.2025.3578283","url":null,"abstract":"This article proposes an ex-vivo method to estimate the dielectric properties and thickness of adipose tissue in the human body. Based on the electrical properties of adipose tissue, obesity levels will be assessed. This approach consists of two steps: 1) data acquisition by an ultrawideband (UWB) time-domain radar and 2) genetic algorithm optimization of the intended goal function. This study considers a three-layered tissue model to mimic the surface of the human abdomen. The experimental phantom consists of a pork skin layer followed by pork fat, then ground pork to emulate the muscle tissue. An aperture with a diameter of 2 cm on a metal sheet focuses the measurements on a small area of interest. The measured results were compared with the actual permittivity and thickness of different layers of the experimental phantom. The technique is also applied to human voxel tissue models available in the CST software library, including babies, children, and adults. The accuracy of measurement data confirms the suitability of this technique. This technique is a noninvasive, safe, cost-effective method to determine the type of fat tissue in the human body and the level of obesity.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 11","pages":"4554-4566"},"PeriodicalIF":0.0,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144260085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Language and Attenuation-Driven Network for Robot-Assisted Cholangiocarcinoma Diagnosis From Optical Coherence Tomography 语言和衰减驱动网络用于机器人辅助的光学相干断层成像胆管癌诊断
Pub Date : 2025-06-09 DOI: 10.1109/TMI.2025.3578179
Chuanhao Zhang;Yangxi Li;Jianping Song;Yuxuan Zhai;Yuchao Zheng;Yingwei Fan;Canhong Xiang;Fang Chen;Hongen Liao
Automatic and accurate classification of cholangiocarcinoma (CCA) using optical coherence tomography (OCT) images is critical for confirming infiltration margins. Considering that the morphological representations in pathology stains can be implicitly captured in OCT imaging, we introduce the optical attenuation coefficient (OAC) and generalized visual-language information to focus on the optical properties of diseased tissue and exploit its inherent textured features. Maintaining the data within the appropriate working range during OCT scanning is crucial for reliable diagnosis. To this end, we propose an autonomous scanning method integrated with novel deep learning architecture to construct an efficient computer-aided system. We develop a cross-modal complementarity model, the language and attenuation-driven network (LA-OCT Net), designed to enhance the interaction between OAC and OCT information and leverage generalized image-text alignment for refined feature representation. The model incorporates a disentangled attenuation selection-based adversarial correlation loss to magnify the discrepancy between cross-modal features while maintaining discriminative consistency. The proposed robot-assisted pipeline ensures precise repositioning of the diseased cross-sectional location, allowing consistent measurements to treatment and precise tumor margin detection. Extensive experiments on a comprehensive clinical dataset demonstrate the effectiveness and superiority of our method. Specifically, our approach not only improves accuracy by 6% compared to state-of-the-art techniques, while also providing new insights into the potential of optical biopsy.
利用光学相干断层扫描(OCT)图像自动准确分类胆管癌(CCA)是确定浸润边缘的关键。考虑到病理染色中的形态学表征可以在OCT成像中隐式捕获,我们引入光学衰减系数(OAC)和广义视觉语言信息来关注病变组织的光学特性,并利用其固有的纹理特征。在OCT扫描过程中,保持数据在适当的工作范围内对于可靠的诊断至关重要。为此,我们提出了一种结合新型深度学习架构的自主扫描方法来构建高效的计算机辅助系统。我们开发了一个跨模态互补模型,即语言和衰减驱动网络(LA-OCT Net),旨在增强OAC和OCT信息之间的交互,并利用广义图像-文本对齐来改进特征表示。该模型结合了基于解纠缠衰减选择的对抗相关损失,在保持判别一致性的同时放大了跨模态特征之间的差异。提议的机器人辅助管道确保了病变横截面位置的精确重新定位,允许一致的测量治疗和精确的肿瘤边缘检测。在一个全面的临床数据集上的大量实验证明了我们的方法的有效性和优越性。具体来说,与最先进的技术相比,我们的方法不仅提高了6%的准确性,同时也为光学活检的潜力提供了新的见解。
{"title":"Language and Attenuation-Driven Network for Robot-Assisted Cholangiocarcinoma Diagnosis From Optical Coherence Tomography","authors":"Chuanhao Zhang;Yangxi Li;Jianping Song;Yuxuan Zhai;Yuchao Zheng;Yingwei Fan;Canhong Xiang;Fang Chen;Hongen Liao","doi":"10.1109/TMI.2025.3578179","DOIUrl":"10.1109/TMI.2025.3578179","url":null,"abstract":"Automatic and accurate classification of cholangiocarcinoma (CCA) using optical coherence tomography (OCT) images is critical for confirming infiltration margins. Considering that the morphological representations in pathology stains can be implicitly captured in OCT imaging, we introduce the optical attenuation coefficient (OAC) and generalized visual-language information to focus on the optical properties of diseased tissue and exploit its inherent textured features. Maintaining the data within the appropriate working range during OCT scanning is crucial for reliable diagnosis. To this end, we propose an autonomous scanning method integrated with novel deep learning architecture to construct an efficient computer-aided system. We develop a cross-modal complementarity model, the language and attenuation-driven network (LA-OCT Net), designed to enhance the interaction between OAC and OCT information and leverage generalized image-text alignment for refined feature representation. The model incorporates a disentangled attenuation selection-based adversarial correlation loss to magnify the discrepancy between cross-modal features while maintaining discriminative consistency. The proposed robot-assisted pipeline ensures precise repositioning of the diseased cross-sectional location, allowing consistent measurements to treatment and precise tumor margin detection. Extensive experiments on a comprehensive clinical dataset demonstrate the effectiveness and superiority of our method. Specifically, our approach not only improves accuracy by 6% compared to state-of-the-art techniques, while also providing new insights into the potential of optical biopsy.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 11","pages":"4511-4523"},"PeriodicalIF":0.0,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144251989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE transactions on medical imaging
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1