在不平衡模式的非 IID 数据集上进行癌症分期的多模式联合学习

Kasra Borazjani, Naji Khosravan, Leslie Ying, Seyyedali Hosseinalipour
{"title":"在不平衡模式的非 IID 数据集上进行癌症分期的多模式联合学习","authors":"Kasra Borazjani, Naji Khosravan, Leslie Ying, Seyyedali Hosseinalipour","doi":"10.1109/TMI.2024.3450855","DOIUrl":null,"url":null,"abstract":"<p><p>The use of machine learning (ML) for cancer staging through medical image analysis has gained substantial interest across medical disciplines. When accompanied by the innovative federated learning (FL) framework, ML techniques can further overcome privacy concerns related to patient data exposure. Given the frequent presence of diverse data modalities within patient records, leveraging FL in a multi-modal learning framework holds considerable promise for cancer staging. However, existing works on multi-modal FL often presume that all data-collecting institutions have access to all data modalities. This oversimplified approach neglects institutions that have access to only a portion of data modalities within the system. In this work, we introduce a novel FL architecture designed to accommodate not only the heterogeneity of data samples, but also the inherent heterogeneity/non-uniformity of data modalities across institutions. We shed light on the challenges associated with varying convergence speeds observed across different data modalities within our FL system. Subsequently, we propose a solution to tackle these challenges by devising a distributed gradient blending and proximity-aware client weighting strategy tailored for multi-modal FL. To show the superiority of our method, we conduct experiments using The Cancer Genome Atlas program (TCGA) datalake considering different cancer types and three modalities of data: mRNA sequences, histopathological image data, and clinical information. Our results further unveil the impact and severity of class-based vs type-based heterogeneity across institutions on the model performance, which widens the perspective to the notion of data heterogeneity in multi-modal FL literature.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-Modal Federated Learning for Cancer Staging over Non-IID Datasets with Unbalanced Modalities.\",\"authors\":\"Kasra Borazjani, Naji Khosravan, Leslie Ying, Seyyedali Hosseinalipour\",\"doi\":\"10.1109/TMI.2024.3450855\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The use of machine learning (ML) for cancer staging through medical image analysis has gained substantial interest across medical disciplines. When accompanied by the innovative federated learning (FL) framework, ML techniques can further overcome privacy concerns related to patient data exposure. Given the frequent presence of diverse data modalities within patient records, leveraging FL in a multi-modal learning framework holds considerable promise for cancer staging. However, existing works on multi-modal FL often presume that all data-collecting institutions have access to all data modalities. This oversimplified approach neglects institutions that have access to only a portion of data modalities within the system. In this work, we introduce a novel FL architecture designed to accommodate not only the heterogeneity of data samples, but also the inherent heterogeneity/non-uniformity of data modalities across institutions. We shed light on the challenges associated with varying convergence speeds observed across different data modalities within our FL system. Subsequently, we propose a solution to tackle these challenges by devising a distributed gradient blending and proximity-aware client weighting strategy tailored for multi-modal FL. To show the superiority of our method, we conduct experiments using The Cancer Genome Atlas program (TCGA) datalake considering different cancer types and three modalities of data: mRNA sequences, histopathological image data, and clinical information. Our results further unveil the impact and severity of class-based vs type-based heterogeneity across institutions on the model performance, which widens the perspective to the notion of data heterogeneity in multi-modal FL literature.</p>\",\"PeriodicalId\":94033,\"journal\":{\"name\":\"IEEE transactions on medical imaging\",\"volume\":\"PP \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on medical imaging\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TMI.2024.3450855\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on medical imaging","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TMI.2024.3450855","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

通过医学影像分析将机器学习(ML)用于癌症分期的做法在医学各学科中引起了广泛关注。如果辅以创新的联合学习(FL)框架,机器学习技术就能进一步克服与患者数据暴露相关的隐私问题。鉴于患者记录中经常出现不同的数据模式,在多模式学习框架中利用 FL 对癌症分期具有相当大的前景。然而,现有的多模态 FL 工作通常假定所有数据收集机构都能访问所有数据模态。这种过于简化的方法忽略了系统中只能访问部分数据模式的机构。在这项工作中,我们介绍了一种新颖的 FL 架构,其设计不仅考虑到了数据样本的异质性,还考虑到了各机构数据模式的固有异质性/不均匀性。我们阐明了在我们的 FL 系统中,不同数据模式的收敛速度不同所带来的挑战。随后,我们提出了应对这些挑战的解决方案,即为多模 FL 量身定制分布式梯度混合和近距离感知客户端加权策略。为了证明我们的方法的优越性,我们使用癌症基因组图谱计划(TCGA)数据集进行了实验,考虑了不同的癌症类型和三种数据模式:mRNA 序列、组织病理学图像数据和临床信息。我们的结果进一步揭示了不同机构间基于类别与基于类型的异质性对模型性能的影响和严重程度,从而拓宽了多模态 FL 文献中数据异质性概念的视野。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Multi-Modal Federated Learning for Cancer Staging over Non-IID Datasets with Unbalanced Modalities.

The use of machine learning (ML) for cancer staging through medical image analysis has gained substantial interest across medical disciplines. When accompanied by the innovative federated learning (FL) framework, ML techniques can further overcome privacy concerns related to patient data exposure. Given the frequent presence of diverse data modalities within patient records, leveraging FL in a multi-modal learning framework holds considerable promise for cancer staging. However, existing works on multi-modal FL often presume that all data-collecting institutions have access to all data modalities. This oversimplified approach neglects institutions that have access to only a portion of data modalities within the system. In this work, we introduce a novel FL architecture designed to accommodate not only the heterogeneity of data samples, but also the inherent heterogeneity/non-uniformity of data modalities across institutions. We shed light on the challenges associated with varying convergence speeds observed across different data modalities within our FL system. Subsequently, we propose a solution to tackle these challenges by devising a distributed gradient blending and proximity-aware client weighting strategy tailored for multi-modal FL. To show the superiority of our method, we conduct experiments using The Cancer Genome Atlas program (TCGA) datalake considering different cancer types and three modalities of data: mRNA sequences, histopathological image data, and clinical information. Our results further unveil the impact and severity of class-based vs type-based heterogeneity across institutions on the model performance, which widens the perspective to the notion of data heterogeneity in multi-modal FL literature.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Building a Synthetic Vascular Model: Evaluation in an Intracranial Aneurysms Detection Scenario. FAMF-Net: Feature Alignment Mutual Attention Fusion with Region Awareness for Breast Cancer Diagnosis via Imbalanced Data. Table of Contents Corrections to “Contrastive Graph Pooling for Explainable Classification of Brain Networks” Multi-Center Fetal Brain Tissue Annotation (FeTA) Challenge 2022 Results.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1