kMoL: an open-source machine and federated learning library for drug discovery

IF 5.7 2区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY Journal of Cheminformatics Pub Date : 2025-02-25 DOI:10.1186/s13321-025-00967-9
Romeo Cozac, Haris Hasic, Jun Jin Choong, Vincent Richard, Loic Beheshti, Cyrille Froehlich, Takuto Koyama, Shigeyuki Matsumoto, Ryosuke Kojima, Hiroaki Iwata, Aki Hasegawa, Takao Otsuka, Yasushi Okuno
{"title":"kMoL: an open-source machine and federated learning library for drug discovery","authors":"Romeo Cozac,&nbsp;Haris Hasic,&nbsp;Jun Jin Choong,&nbsp;Vincent Richard,&nbsp;Loic Beheshti,&nbsp;Cyrille Froehlich,&nbsp;Takuto Koyama,&nbsp;Shigeyuki Matsumoto,&nbsp;Ryosuke Kojima,&nbsp;Hiroaki Iwata,&nbsp;Aki Hasegawa,&nbsp;Takao Otsuka,&nbsp;Yasushi Okuno","doi":"10.1186/s13321-025-00967-9","DOIUrl":null,"url":null,"abstract":"<div><p>Machine learning is quickly becoming integral to drug discovery pipelines, particularly quantitative structure-activity relationship (QSAR) and absorption, distribution, metabolism, and excretion (ADME) tasks. Graph Convolutional Network (GCN) models have proven especially promising due to their inherent ability to model molecular structures using graph-based representations. However, maximizing the potential of such models in practice is challenging, as companies prioritize data privacy and security over collaboration initiatives to improve model performance and robustness. kMoL is an open-source machine learning library with integrated federated learning capabilities developed to address such challenges. Its key features include state-of-the-art model architectures, Bayesian optimization, explainability, and federated learning mechanisms. It demonstrates extensive customization possibilities, advanced security features, straightforward implementation of user-specific models, and high adaptability to custom datasets without additional programming requirements. kMoL is evaluated through locally trained benchmark settings and distributed federated learning experiments using various datasets to assess the features and flexibility of the library, as well as the ability to facilitate fast and practical experimentation. Additionally, results of these experiments provide further insights into the performance trade-offs associated with federated learning strategies, presenting valuable guidance for deploying machine learning models in a privacy-preserving manner within drug discovery pipelines. kMoL is available on GitHub at https://github.com/elix-tech/kmol.</p><p><b>Scientific contribution</b> The primary scientific contribution of this research project is the introduction and evaluation of kMoL, an open-source machine learning library with integrated federated learning capabilities. By demonstrating advanced customization and security capabilities without additional programming requirements, kMoL represents an accessible yet secure open-source platform for collaborative drug discovery projects. Additionally, the experiment results provide further insights into the performance trade-offs associated with federated learning strategies, presenting valuable guidance for deploying machine learning models in a privacy-preserving manner within drug discovery pipelines.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7000,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-00967-9","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cheminformatics","FirstCategoryId":"92","ListUrlMain":"https://link.springer.com/article/10.1186/s13321-025-00967-9","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Machine learning is quickly becoming integral to drug discovery pipelines, particularly quantitative structure-activity relationship (QSAR) and absorption, distribution, metabolism, and excretion (ADME) tasks. Graph Convolutional Network (GCN) models have proven especially promising due to their inherent ability to model molecular structures using graph-based representations. However, maximizing the potential of such models in practice is challenging, as companies prioritize data privacy and security over collaboration initiatives to improve model performance and robustness. kMoL is an open-source machine learning library with integrated federated learning capabilities developed to address such challenges. Its key features include state-of-the-art model architectures, Bayesian optimization, explainability, and federated learning mechanisms. It demonstrates extensive customization possibilities, advanced security features, straightforward implementation of user-specific models, and high adaptability to custom datasets without additional programming requirements. kMoL is evaluated through locally trained benchmark settings and distributed federated learning experiments using various datasets to assess the features and flexibility of the library, as well as the ability to facilitate fast and practical experimentation. Additionally, results of these experiments provide further insights into the performance trade-offs associated with federated learning strategies, presenting valuable guidance for deploying machine learning models in a privacy-preserving manner within drug discovery pipelines. kMoL is available on GitHub at https://github.com/elix-tech/kmol.

Scientific contribution The primary scientific contribution of this research project is the introduction and evaluation of kMoL, an open-source machine learning library with integrated federated learning capabilities. By demonstrating advanced customization and security capabilities without additional programming requirements, kMoL represents an accessible yet secure open-source platform for collaborative drug discovery projects. Additionally, the experiment results provide further insights into the performance trade-offs associated with federated learning strategies, presenting valuable guidance for deploying machine learning models in a privacy-preserving manner within drug discovery pipelines.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
kMoL:用于药物发现的开源机器和联邦学习库
机器学习正迅速成为药物发现管道中不可或缺的一部分,特别是定量构效关系(QSAR)和吸收、分布、代谢和排泄(ADME)任务。图卷积网络(GCN)模型已被证明特别有前途,因为它们具有使用基于图的表示来模拟分子结构的固有能力。然而,在实践中最大化这些模型的潜力是具有挑战性的,因为公司优先考虑数据隐私和安全性,而不是协作计划,以提高模型的性能和健壮性。kMoL是一个开源机器学习库,具有集成的联邦学习功能,旨在解决这些挑战。它的主要特性包括最先进的模型架构、贝叶斯优化、可解释性和联邦学习机制。它展示了广泛的自定义可能性、高级安全特性、用户特定模型的直接实现以及对自定义数据集的高度适应性,而无需额外的编程要求。kMoL通过本地训练的基准设置和使用各种数据集的分布式联邦学习实验来评估库的特征和灵活性,以及促进快速和实用实验的能力。此外,这些实验的结果提供了与联邦学习策略相关的性能权衡的进一步见解,为在药物发现管道中以保护隐私的方式部署机器学习模型提供了有价值的指导。该研究项目的主要科学贡献是引入和评估kMoL,这是一个具有集成联邦学习功能的开源机器学习库。通过展示高级定制和安全功能,而无需额外的编程要求,kMoL代表了一个可访问但安全的开源平台,用于协作药物发现项目。此外,实验结果提供了与联邦学习策略相关的性能权衡的进一步见解,为在药物发现管道中以保护隐私的方式部署机器学习模型提供了有价值的指导。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Cheminformatics
Journal of Cheminformatics CHEMISTRY, MULTIDISCIPLINARY-COMPUTER SCIENCE, INFORMATION SYSTEMS
CiteScore
14.10
自引率
7.00%
发文量
82
审稿时长
3 months
期刊介绍: Journal of Cheminformatics is an open access journal publishing original peer-reviewed research in all aspects of cheminformatics and molecular modelling. Coverage includes, but is not limited to: chemical information systems, software and databases, and molecular modelling, chemical structure representations and their use in structure, substructure, and similarity searching of chemical substance and chemical reaction databases, computer and molecular graphics, computer-aided molecular design, expert systems, QSAR, and data mining techniques.
期刊最新文献
Data curation in cheminformatics: importance and implementation. Collision-free morgan fingerprints: a principled approach to enhance machine learning performance and interpretability in chemistry. Graph-based transformer to predict the octanol-water partition coefficient. Privileged structure-based molecular fingerprints for organic electronic materials: towards intuitive machine learning interpretation PROTAC-Splitter: a machine learning framework for automated identification of PROTAC substructures
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1