Conformal efficiency as a metric for comparative model assessment befitting federated learning

Wouter Heyndrickx , Adam Arany , Jaak Simm , Anastasia Pentina , Noé Sturm , Lina Humbeck , Lewis Mervin , Adam Zalewski , Martijn Oldenhof , Peter Schmidtke , Lukas Friedrich , Regis Loeb , Arina Afanasyeva , Ansgar Schuffenhauer , Yves Moreau , Hugo Ceulemans
{"title":"Conformal efficiency as a metric for comparative model assessment befitting federated learning","authors":"Wouter Heyndrickx ,&nbsp;Adam Arany ,&nbsp;Jaak Simm ,&nbsp;Anastasia Pentina ,&nbsp;Noé Sturm ,&nbsp;Lina Humbeck ,&nbsp;Lewis Mervin ,&nbsp;Adam Zalewski ,&nbsp;Martijn Oldenhof ,&nbsp;Peter Schmidtke ,&nbsp;Lukas Friedrich ,&nbsp;Regis Loeb ,&nbsp;Arina Afanasyeva ,&nbsp;Ansgar Schuffenhauer ,&nbsp;Yves Moreau ,&nbsp;Hugo Ceulemans","doi":"10.1016/j.ailsci.2023.100070","DOIUrl":null,"url":null,"abstract":"<div><p>In a drug discovery setting, pharmaceutical companies own substantial but confidential datasets. The MELLODDY project developed a privacy-preserving federated machine learning solution and deployed it at an unprecedented scale. Each partner built models for their own private assays that benefitted from a shared representation. Established predictive performance metrics such as AUC ROC or AUC PR are constrained to unseen labeled chemical space and cannot gage performance gains in unlabeled chemical space. Federated learning indirectly extends labeled space, but in a privacy-preserving context, a partner cannot use this label extension for performance assessment. Metrics that estimate uncertainty on a prediction can be calculated even where no label is known. Practically, the chemical space covered with predictions above an uncertainty threshold, reflects the applicability domain of a model. After establishing a link to established performance metrics, we propose the efficiency from the conformal prediction framework (‘conformal efficiency’) as a proxy to the applicability domain size. A documented extension of the applicability domain would qualify as a tangible benefit from federated learning. In interim assessments, MELLODDY partners reported a median increase in conformal efficiency of the federated over the single-partner model of 5.5% (with increases up to 9.7%). Subject to distributional conditions, that efficiency increase can be directly interpreted as the expected increase in conformal i.e. low uncertainty predictions. In conclusion, we present the first indication that privacy-preserving federated machine learning across massive drug-discovery datasets from ten pharma partners indeed extends the applicability domain of property prediction models.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial intelligence in the life sciences","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667318523000144","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

In a drug discovery setting, pharmaceutical companies own substantial but confidential datasets. The MELLODDY project developed a privacy-preserving federated machine learning solution and deployed it at an unprecedented scale. Each partner built models for their own private assays that benefitted from a shared representation. Established predictive performance metrics such as AUC ROC or AUC PR are constrained to unseen labeled chemical space and cannot gage performance gains in unlabeled chemical space. Federated learning indirectly extends labeled space, but in a privacy-preserving context, a partner cannot use this label extension for performance assessment. Metrics that estimate uncertainty on a prediction can be calculated even where no label is known. Practically, the chemical space covered with predictions above an uncertainty threshold, reflects the applicability domain of a model. After establishing a link to established performance metrics, we propose the efficiency from the conformal prediction framework (‘conformal efficiency’) as a proxy to the applicability domain size. A documented extension of the applicability domain would qualify as a tangible benefit from federated learning. In interim assessments, MELLODDY partners reported a median increase in conformal efficiency of the federated over the single-partner model of 5.5% (with increases up to 9.7%). Subject to distributional conditions, that efficiency increase can be directly interpreted as the expected increase in conformal i.e. low uncertainty predictions. In conclusion, we present the first indication that privacy-preserving federated machine learning across massive drug-discovery datasets from ten pharma partners indeed extends the applicability domain of property prediction models.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
适形效率作为适合联邦学习的比较模型评估的度量
在药物研发环境中,制药公司拥有大量但保密的数据集。MELLODDY项目开发了一种保护隐私的联邦机器学习解决方案,并以前所未有的规模进行了部署。每个合作伙伴都为自己的私人分析建立了模型,这些模型受益于共享的表示。已建立的预测性能指标(如AUC ROC或AUC PR)仅限于未见标记的化学空间,无法衡量未标记的化学空间中的性能增益。联邦学习间接地扩展了标记空间,但是在保护隐私的上下文中,合作伙伴不能使用这个标签扩展进行性能评估。即使在没有已知标签的情况下,也可以计算出估计预测不确定性的度量。实际上,化学空间覆盖着超过不确定性阈值的预测,反映了模型的适用范围。在建立了与已建立的性能指标的联系之后,我们提出了共形预测框架的效率(“共形效率”)作为适用领域大小的代理。适用性领域的文档化扩展将符合联邦学习的实际好处。在中期评估中,MELLODDY合作伙伴报告联合的适形效率中位数比单一合作伙伴模型提高了5.5%(最高可达9.7%)。根据分布条件,效率的提高可以直接解释为保形预测(即低不确定性预测)的预期增加。总之,我们提出了第一个迹象,表明来自十个制药合作伙伴的大规模药物发现数据集的隐私保护联合机器学习确实扩展了属性预测模型的适用范围。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Artificial intelligence in the life sciences
Artificial intelligence in the life sciences Pharmacology, Biochemistry, Genetics and Molecular Biology (General), Computer Science Applications, Health Informatics, Drug Discovery, Veterinary Science and Veterinary Medicine (General)
CiteScore
5.00
自引率
0.00%
发文量
0
审稿时长
15 days
期刊最新文献
Modeling PROTAC degradation activity with machine learning Machine learning proteochemometric models for Cereblon glue activity predictions Editorial Board Statistical approaches enabling technology-specific assay interference prediction from large screening data sets Federated learning for predicting compound mechanism of action based on image-data from cell painting
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1