Enhancing molecular design efficiency: Uniting language models and generative networks with genetic algorithms

IF 6.7 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Patterns Pub Date : 2024-03-14 DOI:10.1016/j.patter.2024.100947
Debsindhu Bhowmik, Pei Zhang, Zachary Fox, Stephan Irle, John Gounley
{"title":"Enhancing molecular design efficiency: Uniting language models and generative networks with genetic algorithms","authors":"Debsindhu Bhowmik, Pei Zhang, Zachary Fox, Stephan Irle, John Gounley","doi":"10.1016/j.patter.2024.100947","DOIUrl":null,"url":null,"abstract":"This study examines the effectiveness of generative models in drug discovery, material science, and polymer science, aiming to overcome constraints associated with traditional inverse design methods relying on heuristic rules. Generative models generate synthetic data resembling real data, enabling deep learning model training without extensive labeled datasets. They prove valuable in creating virtual libraries of molecules for material science and facilitating drug discovery by generating molecules with specific properties. While generative adversarial networks (GANs) are explored for these purposes, mode collapse restricts their efficacy, limiting novel structure variability. To address this, we introduce a masked language model (LM) inspired by natural language processing. Although LMs alone can have inherent limitations, we propose a hybrid architecture combining LMs and GANs to efficiently generate new molecules, demonstrating superior performance over standalone masked LMs, particularly for smaller population sizes. This hybrid LM-GAN architecture enhances efficiency in optimizing properties and generating novel samples.","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"2 1","pages":""},"PeriodicalIF":6.7000,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Patterns","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.patter.2024.100947","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

This study examines the effectiveness of generative models in drug discovery, material science, and polymer science, aiming to overcome constraints associated with traditional inverse design methods relying on heuristic rules. Generative models generate synthetic data resembling real data, enabling deep learning model training without extensive labeled datasets. They prove valuable in creating virtual libraries of molecules for material science and facilitating drug discovery by generating molecules with specific properties. While generative adversarial networks (GANs) are explored for these purposes, mode collapse restricts their efficacy, limiting novel structure variability. To address this, we introduce a masked language model (LM) inspired by natural language processing. Although LMs alone can have inherent limitations, we propose a hybrid architecture combining LMs and GANs to efficiently generate new molecules, demonstrating superior performance over standalone masked LMs, particularly for smaller population sizes. This hybrid LM-GAN architecture enhances efficiency in optimizing properties and generating novel samples.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
提高分子设计效率:将语言模型和生成网络与遗传算法相结合
本研究探讨了生成模型在药物发现、材料科学和高分子科学中的有效性,旨在克服与依赖启发式规则的传统逆向设计方法相关的制约因素。生成模型能生成与真实数据相似的合成数据,从而无需大量标注数据集即可进行深度学习模型训练。事实证明,生成模型在为材料科学创建虚拟分子库以及通过生成具有特定性质的分子促进药物发现方面具有重要价值。虽然生成式对抗网络(GANs)被用于这些目的,但模式崩溃限制了它们的功效,限制了新结构的可变性。为了解决这个问题,我们引入了受自然语言处理启发的遮蔽语言模型(LM)。虽然单独的语言模型可能存在固有的局限性,但我们提出了一种结合语言模型和 GAN 的混合架构,以高效生成新分子,其性能优于独立的屏蔽语言模型,尤其是在较小的种群规模下。这种 LM-GAN 混合架构提高了优化属性和生成新样本的效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Patterns
Patterns Decision Sciences-Decision Sciences (all)
CiteScore
10.60
自引率
4.60%
发文量
153
审稿时长
19 weeks
期刊介绍:
期刊最新文献
Data-knowledge co-driven innovations in engineering and management. Integration of large language models and federated learning. Decorrelative network architecture for robust electrocardiogram classification. Best holdout assessment is sufficient for cancer transcriptomic model selection. The recent Physics and Chemistry Nobel Prizes, AI, and the convergence of knowledge fields.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1