Diffusing on Two Levels and Optimizing for Multiple Properties: A Novel Approach to Generating Molecules With Desirable Properties

IF 3.6 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS IEEE/ACM Transactions on Computational Biology and Bioinformatics Pub Date : 2024-07-26 DOI:10.1109/TCBB.2024.3434461
Siyuan Guo;Jihong Guan;Shuigeng Zhou
{"title":"Diffusing on Two Levels and Optimizing for Multiple Properties: A Novel Approach to Generating Molecules With Desirable Properties","authors":"Siyuan Guo;Jihong Guan;Shuigeng Zhou","doi":"10.1109/TCBB.2024.3434461","DOIUrl":null,"url":null,"abstract":"In the past decade, Artificial Intelligence (AI) driven drug design and discovery has been a hot research topic in the AI area, where an important branch is molecule generation by generative models, from GAN-based models and VAE-based models to the latest diffusion-based models. However, most existing models pursue mainly the basic properties like \n<italic>validity</i>\n and \n<italic>uniqueness</i>\n of the generated molecules, a few go further to explicitly optimize one single important molecular property (e.g. QED or PlogP), which makes most generated molecules little usefulness in practice. In this paper, we present a novel approach to generating molecules with desirable properties, which expands the diffusion model framework with multiple innovative designs. The novelty is two-fold. On the one hand, considering that the structures of molecules are complex and diverse, and molecular properties are usually determined by some substructures (e.g. pharmacophores), we propose to perform diffusion on two structural levels: molecules and molecular fragments respectively, with which a mixed Gaussian distribution is obtained for the reverse diffusion process. To get desirable molecular fragments, we develop a novel \n<italic>electronic effect</i>\n based fragmentation method. On the other hand, we introduce two ways to explicitly optimize multiple molecular properties under the diffusion model framework. First, as potential drug molecules must be chemically valid, we optimize molecular validity by an energy-guidance function. Second, since potential drug molecules should be desirable in various properties, we employ a multi-objective mechanism to optimize multiple molecular properties simultaneously. Extensive experiments with two benchmark datasets QM9 and ZINC250 k show that the molecules generated by our proposed method have better \n<italic>validity, uniqueness, novelty, Fréchet ChemNet Distance (FCD), QED, and PlogP</i>\n than those generated by current SOTA models.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2050-2063"},"PeriodicalIF":3.6000,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10612764/","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

In the past decade, Artificial Intelligence (AI) driven drug design and discovery has been a hot research topic in the AI area, where an important branch is molecule generation by generative models, from GAN-based models and VAE-based models to the latest diffusion-based models. However, most existing models pursue mainly the basic properties like validity and uniqueness of the generated molecules, a few go further to explicitly optimize one single important molecular property (e.g. QED or PlogP), which makes most generated molecules little usefulness in practice. In this paper, we present a novel approach to generating molecules with desirable properties, which expands the diffusion model framework with multiple innovative designs. The novelty is two-fold. On the one hand, considering that the structures of molecules are complex and diverse, and molecular properties are usually determined by some substructures (e.g. pharmacophores), we propose to perform diffusion on two structural levels: molecules and molecular fragments respectively, with which a mixed Gaussian distribution is obtained for the reverse diffusion process. To get desirable molecular fragments, we develop a novel electronic effect based fragmentation method. On the other hand, we introduce two ways to explicitly optimize multiple molecular properties under the diffusion model framework. First, as potential drug molecules must be chemically valid, we optimize molecular validity by an energy-guidance function. Second, since potential drug molecules should be desirable in various properties, we employ a multi-objective mechanism to optimize multiple molecular properties simultaneously. Extensive experiments with two benchmark datasets QM9 and ZINC250 k show that the molecules generated by our proposed method have better validity, uniqueness, novelty, Fréchet ChemNet Distance (FCD), QED, and PlogP than those generated by current SOTA models.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
两级扩散和优化多种特性:生成具有理想特性的分子的新方法。
在过去十年中,人工智能(AI)驱动的药物设计与发现一直是人工智能领域的研究热点,其中一个重要分支是通过生成模型生成分子,从基于 GAN 的模型、基于 VAE 的模型到最新的基于扩散的模型。然而,大多数现有模型主要追求生成分子的有效性和唯一性等基本属性,少数模型则进一步明确优化某个重要的分子属性(如 QED 或 PlogP),这使得大多数生成的分子在实践中用处不大。在本文中,我们提出了一种生成具有理想特性的分子的新方法,通过多种创新设计扩展了扩散模型框架。新颖之处有两方面。一方面,考虑到分子结构复杂多样,而分子特性通常由一些子结构(如药理结构)决定,我们建议分别在分子和分子片段这两个结构层次上进行扩散,从而获得混合高斯分布的反向扩散过程。为了得到理想的分子片段,我们开发了一种基于电子效应的新型破碎方法。另一方面,我们介绍了在扩散模型框架下明确优化多种分子特性的两种方法。首先,由于潜在药物分子必须具有化学有效性,我们通过能量引导函数来优化分子有效性。其次,由于潜在药物分子应具有各种理想特性,我们采用了一种多目标机制来同时优化多种分子特性。用两个基准数据集 QM9 和 ZINC250k 进行的大量实验表明,我们提出的方法生成的分子在有效性、唯一性、新颖性、Fr´echet ChemNet Distance (FCD)、QED 和 PlogP 等方面都优于目前的 SOTA 模型。D2L-OMP 的代码见 https://github.com/bz99bz/D2L-OMP。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
7.50
自引率
6.70%
发文量
479
审稿时长
3 months
期刊介绍: IEEE/ACM Transactions on Computational Biology and Bioinformatics emphasizes the algorithmic, mathematical, statistical and computational methods that are central in bioinformatics and computational biology; the development and testing of effective computer programs in bioinformatics; the development of biological databases; and important biological results that are obtained from the use of these methods, programs and databases; the emerging field of Systems Biology, where many forms of data are used to create a computer-based model of a complex biological system
期刊最新文献
Guest Editorial Guest Editorial for the 20th Asia Pacific Bioinformatics Conference iAnOxPep: a machine learning model for the identification of anti-oxidative peptides using ensemble learning. DeepLigType: Predicting Ligand Types of Protein-Ligand Binding Sites Using a Deep Learning Model. Performance Comparison between Deep Neural Network and Machine Learning based Classifiers for Huntington Disease Prediction from Human DNA Sequence. AI-based Computational Methods in Early Drug Discovery and Post Market Drug Assessment: A Survey.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1