FHBF: Federated hybrid boosted forests with dropout rates for supervised learning tasks across highly imbalanced clinical datasets

IF 6.7 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Patterns Pub Date : 2024-01-12 DOI:10.1016/j.patter.2023.100893
Vasileios C. Pezoulas, Fanis Kalatzis, Themis P. Exarchos, Andreas Goules, Athanasios G. Tzioufas, Dimitrios I. Fotiadis
{"title":"FHBF: Federated hybrid boosted forests with dropout rates for supervised learning tasks across highly imbalanced clinical datasets","authors":"Vasileios C. Pezoulas, Fanis Kalatzis, Themis P. Exarchos, Andreas Goules, Athanasios G. Tzioufas, Dimitrios I. Fotiadis","doi":"10.1016/j.patter.2023.100893","DOIUrl":null,"url":null,"abstract":"<p>Although several studies have deployed gradient boosting trees (GBT) as a robust classifier for federated learning tasks (federated GBT [FGBT]), even with dropout rates (federated gradient boosting trees with dropout rate [FDART]), none of them have investigated the overfitting effects of FGBT across heterogeneous and highly imbalanced datasets within federated environments nor the effect of dropouts in the loss function. In this work, we present the federated hybrid boosted forests (FHBF) algorithm, which incorporates a hybrid weight update approach to overcome ill-posed problems that arise from overfitting effects during the training across highly imbalanced datasets in the cloud. Eight case studies were conducted to stress the performance of FHBF against existing algorithms toward the development of robust AI models for lymphoma development across 18 European federated databases. Our results highlight the robustness of FHBF, yielding an average loss of 0.527 compared with FGBT (0.611) and FDART (0.584) with increased classification performance (0.938 sensitivity, 0.732 specificity).</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"14 1","pages":""},"PeriodicalIF":6.7000,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Patterns","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.patter.2023.100893","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Although several studies have deployed gradient boosting trees (GBT) as a robust classifier for federated learning tasks (federated GBT [FGBT]), even with dropout rates (federated gradient boosting trees with dropout rate [FDART]), none of them have investigated the overfitting effects of FGBT across heterogeneous and highly imbalanced datasets within federated environments nor the effect of dropouts in the loss function. In this work, we present the federated hybrid boosted forests (FHBF) algorithm, which incorporates a hybrid weight update approach to overcome ill-posed problems that arise from overfitting effects during the training across highly imbalanced datasets in the cloud. Eight case studies were conducted to stress the performance of FHBF against existing algorithms toward the development of robust AI models for lymphoma development across 18 European federated databases. Our results highlight the robustness of FHBF, yielding an average loss of 0.527 compared with FGBT (0.611) and FDART (0.584) with increased classification performance (0.938 sensitivity, 0.732 specificity).

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
FHBF:针对高度不平衡临床数据集上的监督学习任务的具有辍学率的联合混合提升森林
尽管已有多项研究将梯度提升树(GBT)作为一种稳健的分类器用于联合学习任务(联合 GBT [FGBT]),甚至是有辍学率的任务(有辍学率的联合梯度提升树 [FDART]),但这些研究都没有研究过联合 GBT 在联合环境中跨异构和高度不平衡数据集时的过拟合效应,也没有研究过损失函数中的辍学效应。在这项工作中,我们提出了联合混合提升森林(FHBF)算法,该算法采用了混合权重更新方法,以克服在云中高度不平衡数据集的训练过程中因过拟合效应而产生的问题。我们进行了八项案例研究,以强调 FHBF 与现有算法的性能对比,从而在 18 个欧洲联合数据库中开发出用于淋巴瘤开发的稳健人工智能模型。我们的结果凸显了FHBF的鲁棒性,与FGBT(0.611)和FDART(0.584)相比,FHBF的平均损失为0.527,分类性能却有所提高(灵敏度为0.938,特异度为0.732)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Patterns
Patterns Decision Sciences-Decision Sciences (all)
CiteScore
10.60
自引率
4.60%
发文量
153
审稿时长
19 weeks
期刊介绍:
期刊最新文献
Data-knowledge co-driven innovations in engineering and management. Integration of large language models and federated learning. Decorrelative network architecture for robust electrocardiogram classification. Best holdout assessment is sufficient for cancer transcriptomic model selection. The recent Physics and Chemistry Nobel Prizes, AI, and the convergence of knowledge fields.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1