3DSMILES-GPT: 3D molecular pocket-based generation with token-only large language model†

IF 7.4 1区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY Chemical Science Pub Date : 2024-12-04 DOI:10.1039/D4SC06864E
Jike Wang, Hao Luo, Rui Qin, Mingyang Wang, Xiaozhe Wan, Meijing Fang, Odin Zhang, Qiaolin Gou, Qun Su, Chao Shen, Ziyi You, Liwei Liu, Chang-Yu Hsieh, Tingjun Hou and Yu Kang
{"title":"3DSMILES-GPT: 3D molecular pocket-based generation with token-only large language model†","authors":"Jike Wang, Hao Luo, Rui Qin, Mingyang Wang, Xiaozhe Wan, Meijing Fang, Odin Zhang, Qiaolin Gou, Qun Su, Chao Shen, Ziyi You, Liwei Liu, Chang-Yu Hsieh, Tingjun Hou and Yu Kang","doi":"10.1039/D4SC06864E","DOIUrl":null,"url":null,"abstract":"<p >The generation of three-dimensional (3D) molecules based on target structures represents a cutting-edge challenge in drug discovery. Many existing approaches often produce molecules with invalid configurations, unphysical conformations, suboptimal drug-like qualities, limited synthesizability, and require extensive generation times. To address these challenges, we present 3DSMILES-GPT, a fully language-model-driven framework for 3D molecular generation that utilizes tokens exclusively. We treat both two-dimensional (2D) and 3D molecular representations as linguistic expressions, combining them through full-dimensional representations and pre-training the model on a vast dataset encompassing tens of millions of drug-like molecules. This token-only approach enables the model to comprehensively understand the 2D and 3D characteristics of large-scale molecules. Subsequently, we fine-tune the model using pair-wise structural data of protein pockets and molecules, followed by reinforcement learning to further optimize the biophysical and chemical properties of the generated molecules. Experimental results demonstrate that 3DSMILES-GPT generates molecules that comprehensively outperform existing methods in terms of binding affinity, drug-likeness (QED), and synthetic accessibility score (SAS). Notably, it achieves a 33% enhancement in the quantitative estimation of QED, meanwhile the binding affinity estimated by Vina docking maintaining its state-of-the-art performance. The generation speed is remarkably fast, with the average time approximately 0.45 seconds per generation, representing a threefold increase over the fastest existing methods. This innovative 3DSMILES-GPT approach has the potential to positively impact the generation of 3D molecules in drug discovery.</p>","PeriodicalId":9909,"journal":{"name":"Chemical Science","volume":" 2","pages":" 637-648"},"PeriodicalIF":7.4000,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11629531/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemical Science","FirstCategoryId":"92","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2025/sc/d4sc06864e","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

The generation of three-dimensional (3D) molecules based on target structures represents a cutting-edge challenge in drug discovery. Many existing approaches often produce molecules with invalid configurations, unphysical conformations, suboptimal drug-like qualities, limited synthesizability, and require extensive generation times. To address these challenges, we present 3DSMILES-GPT, a fully language-model-driven framework for 3D molecular generation that utilizes tokens exclusively. We treat both two-dimensional (2D) and 3D molecular representations as linguistic expressions, combining them through full-dimensional representations and pre-training the model on a vast dataset encompassing tens of millions of drug-like molecules. This token-only approach enables the model to comprehensively understand the 2D and 3D characteristics of large-scale molecules. Subsequently, we fine-tune the model using pair-wise structural data of protein pockets and molecules, followed by reinforcement learning to further optimize the biophysical and chemical properties of the generated molecules. Experimental results demonstrate that 3DSMILES-GPT generates molecules that comprehensively outperform existing methods in terms of binding affinity, drug-likeness (QED), and synthetic accessibility score (SAS). Notably, it achieves a 33% enhancement in the quantitative estimation of QED, meanwhile the binding affinity estimated by Vina docking maintaining its state-of-the-art performance. The generation speed is remarkably fast, with the average time approximately 0.45 seconds per generation, representing a threefold increase over the fastest existing methods. This innovative 3DSMILES-GPT approach has the potential to positively impact the generation of 3D molecules in drug discovery.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
3DSMILES-GPT:基于3D分子口袋的生成,具有仅标记的大语言模型。
基于目标结构的三维分子的生成是药物发现领域的一个前沿挑战。许多现有的方法通常产生的分子具有无效的构型,非物理构象,次优的药物性质,有限的合成能力,并且需要大量的生成时间。为了解决这些挑战,我们提出了3DSMILES-GPT,这是一个完全语言模型驱动的3D分子生成框架,专门利用令牌。我们将二维(2D)和三维分子表示作为语言表达,通过全维表示将它们结合起来,并在包含数千万类药物分子的庞大数据集上对模型进行预训练。这种仅限令牌的方法使模型能够全面了解大尺度分子的二维和三维特征。随后,我们使用蛋白质口袋和分子的成对结构数据对模型进行微调,随后进行强化学习以进一步优化生成分子的生物物理和化学性质。实验结果表明,3DSMILES-GPT生成的分子在结合亲和力、药物相似性(QED)和合成可及性评分(SAS)方面全面优于现有方法。值得注意的是,它在QED的定量估计上提高了33%,同时Vina对接估计的结合亲和力保持了最先进的性能。生成速度非常快,平均每代约0.45秒,比现有最快的方法增加了三倍。这种创新的3DSMILES-GPT方法有可能对药物发现中的3D分子产生积极影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Chemical Science
Chemical Science CHEMISTRY, MULTIDISCIPLINARY-
CiteScore
14.40
自引率
4.80%
发文量
1352
审稿时长
2.1 months
期刊介绍: Chemical Science is a journal that encompasses various disciplines within the chemical sciences. Its scope includes publishing ground-breaking research with significant implications for its respective field, as well as appealing to a wider audience in related areas. To be considered for publication, articles must showcase innovative and original advances in their field of study and be presented in a manner that is understandable to scientists from diverse backgrounds. However, the journal generally does not publish highly specialized research.
期刊最新文献
Individually addressable multichannel nanoelectrodes reveal spatially resolved functional heterogeneity of vesicles in single cells Instability of PCN-224(Fe) during the Oxygen Reduction Reaction; Metal-Organic Framework Electrocatalysts may have an Achilles heel Visible-Light-Induced Chlorine Photoelimination from Acridinium-Phosphine Gold(III) Complexes Linker Desymmetrisation Directs Low-polar Cages in an Anion-Pillared MOF for Acetylene and Ethylene Purification from Ternary Mixtures Asymmetric electronic modulation in bridged Cu-O-Ni dual-atom catalysts promoting CO2 electroreduction
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1