MolCFL: A personalized and privacy-preserving drug discovery framework based on generative clustered federated learning

IF 4.5 2区医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Journal of Biomedical Informatics Pub Date : 2024-09-01 DOI:10.1016/j.jbi.2024.104712

Yan Guo, Yongqiang Gao, Jiawei Song

{"title":"MolCFL: A personalized and privacy-preserving drug discovery framework based on generative clustered federated learning","authors":"Yan Guo, Yongqiang Gao, Jiawei Song","doi":"10.1016/j.jbi.2024.104712","DOIUrl":null,"url":null,"abstract":"<div><p>In today’s era of rapid development of large models, the traditional drug development process is undergoing a profound transformation. The vast demand for data and consumption of computational resources are making independent drug discovery increasingly difficult. By integrating federated learning technology into the drug discovery field, we have found a solution that both protects privacy and shares computational power. However, the differences in data held by various pharmaceutical institutions and the diversity in drug design objectives have exacerbated the issue of data heterogeneity, making traditional federated learning consensus models unable to meet the personalized needs of all parties. In this study, we introduce and evaluate an innovative drug discovery framework, MolCFL, which utilizes a multi-layer perceptron (MLP) as the generator and a graph convolutional network (GCN) as the discriminator in a generative adversarial network (GAN). By learning the graph structure of molecules, it generates new molecules in a highly personalized manner and then optimizes the learning process by clustering federated learning, grouping compound data with high similarity. MolCFL not only enhances the model’s ability to protect privacy but also significantly improves the efficiency and personalization of molecular design. MolCFL exhibits superior performance when handling non-independently and identically distributed data compared to traditional models. Experimental results show that the framework demonstrates outstanding performance on two benchmark datasets, with the generated new molecules achieving over 90% in Uniqueness and close to 100% in Novelty. MolCFL not only improves the quality and efficiency of drug molecule design but also, through its highly customized clustered federated learning environment, promotes collaboration and specialization in the drug discovery process while ensuring data privacy. These features make MolCFL a powerful tool suitable for addressing the various challenges faced in the modern drug research and development field.</p></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"157 ","pages":"Article 104712"},"PeriodicalIF":4.5000,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biomedical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1532046424001308","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

In today’s era of rapid development of large models, the traditional drug development process is undergoing a profound transformation. The vast demand for data and consumption of computational resources are making independent drug discovery increasingly difficult. By integrating federated learning technology into the drug discovery field, we have found a solution that both protects privacy and shares computational power. However, the differences in data held by various pharmaceutical institutions and the diversity in drug design objectives have exacerbated the issue of data heterogeneity, making traditional federated learning consensus models unable to meet the personalized needs of all parties. In this study, we introduce and evaluate an innovative drug discovery framework, MolCFL, which utilizes a multi-layer perceptron (MLP) as the generator and a graph convolutional network (GCN) as the discriminator in a generative adversarial network (GAN). By learning the graph structure of molecules, it generates new molecules in a highly personalized manner and then optimizes the learning process by clustering federated learning, grouping compound data with high similarity. MolCFL not only enhances the model’s ability to protect privacy but also significantly improves the efficiency and personalization of molecular design. MolCFL exhibits superior performance when handling non-independently and identically distributed data compared to traditional models. Experimental results show that the framework demonstrates outstanding performance on two benchmark datasets, with the generated new molecules achieving over 90% in Uniqueness and close to 100% in Novelty. MolCFL not only improves the quality and efficiency of drug molecule design but also, through its highly customized clustered federated learning environment, promotes collaboration and specialization in the drug discovery process while ensuring data privacy. These features make MolCFL a powerful tool suitable for addressing the various challenges faced in the modern drug research and development field.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

MolCFL：基于生成聚类联合学习的个性化和保护隐私的药物发现框架。

在当今大型模型快速发展的时代，传统的药物研发过程正在经历一场深刻的变革。对数据的巨大需求和对计算资源的消耗使得独立药物发现变得越来越困难。通过将联合学习技术融入药物研发领域，我们找到了一种既能保护隐私又能共享计算能力的解决方案。然而，不同制药机构所掌握数据的差异和药物设计目标的多样性加剧了数据异构问题，使得传统的联合学习共识模型无法满足各方的个性化需求。在本研究中，我们介绍并评估了一种创新的药物发现框架--MolCFL，它在生成对抗网络（GAN）中使用多层感知器（MLP）作为生成器，图卷积网络（GCN）作为判别器。通过学习分子的图结构，它能以高度个性化的方式生成新分子，然后通过聚类联合学习优化学习过程，将具有高度相似性的复合数据分组。MolCFL 不仅增强了模型保护隐私的能力，还显著提高了分子设计的效率和个性化程度。与传统模型相比，MolCFL 在处理非独立和同分布数据时表现出更优越的性能。实验结果表明，该框架在两个基准数据集上表现出色，生成的新分子唯一性超过90%，新颖性接近100%。MolCFL不仅提高了药物分子设计的质量和效率，而且通过其高度定制的集群联合学习环境，促进了药物发现过程中的协作和专业化，同时确保了数据隐私。这些特点使 MolCFL 成为一个强大的工具，适用于应对现代药物研发领域面临的各种挑战。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Biomedical Informatics 医学-计算机：跨学科应用

CiteScore

8.90

自引率

6.70%

发文量

243

审稿时长

32 days

期刊介绍： The Journal of Biomedical Informatics reflects a commitment to high-quality original research papers, reviews, and commentaries in the area of biomedical informatics methodology. Although we publish articles motivated by applications in the biomedical sciences (for example, clinical medicine, health care, population health, and translational bioinformatics), the journal emphasizes reports of new methodologies and techniques that have general applicability and that form the basis for the evolving science of biomedical informatics. Articles on medical devices; evaluations of implemented systems (including clinical trials of information technologies); or papers that provide insight into a biological process, a specific disease, or treatment options would generally be more suitable for publication in other venues. Papers on applications of signal processing and image analysis are often more suitable for biomedical engineering journals or other informatics journals, although we do publish papers that emphasize the information management and knowledge representation/modeling issues that arise in the storage and use of biological signals and images. System descriptions are welcome if they illustrate and substantiate the underlying methodology that is the principal focus of the report and an effort is made to address the generalizability and/or range of application of that methodology. Note also that, given the international nature of JBI, papers that deal with specific languages other than English, or with country-specific health systems or approaches, are acceptable for JBI only if they offer generalizable lessons that are relevant to the broad JBI readership, regardless of their country, language, culture, or health system.