{"title":"Large language models design sequence-defined macromolecules via evolutionary optimization","authors":"Wesley F. Reinhart, Antonia Statt","doi":"10.1038/s41524-024-01449-6","DOIUrl":null,"url":null,"abstract":"<p>We demonstrate the ability of a large language model to perform evolutionary optimization for materials discovery. Anthropic’s Claude 3.5 model outperforms an active learning scheme with handcrafted surrogate models and an evolutionary algorithm in selecting monomer sequences to produce targeted morphologies in macromolecular self-assembly. Utilizing pre-trained language models can potentially reduce the need for hyperparameter tuning while offering new capabilities such as self-reflection. The model performs this task effectively with or without context about the task itself, but domain-specific context sometimes results in faster convergence to good solutions. Furthermore, when this context is withheld, the model infers an approximate notion of the task (e.g., calling it a protein folding problem). This work provides evidence of Claude 3.5’s ability to act as an evolutionary optimizer, a recently discovered emergent behavior of large language models, and demonstrates a practical use case in the study and design of soft materials.</p>","PeriodicalId":19342,"journal":{"name":"npj Computational Materials","volume":"250 1","pages":""},"PeriodicalIF":9.4000,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"npj Computational Materials","FirstCategoryId":"88","ListUrlMain":"https://doi.org/10.1038/s41524-024-01449-6","RegionNum":1,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0
Abstract
We demonstrate the ability of a large language model to perform evolutionary optimization for materials discovery. Anthropic’s Claude 3.5 model outperforms an active learning scheme with handcrafted surrogate models and an evolutionary algorithm in selecting monomer sequences to produce targeted morphologies in macromolecular self-assembly. Utilizing pre-trained language models can potentially reduce the need for hyperparameter tuning while offering new capabilities such as self-reflection. The model performs this task effectively with or without context about the task itself, but domain-specific context sometimes results in faster convergence to good solutions. Furthermore, when this context is withheld, the model infers an approximate notion of the task (e.g., calling it a protein folding problem). This work provides evidence of Claude 3.5’s ability to act as an evolutionary optimizer, a recently discovered emergent behavior of large language models, and demonstrates a practical use case in the study and design of soft materials.
我们展示了大型语言模型为材料发现进行进化优化的能力。Anthropic 的 Claude 3.5 模型在选择单体序列以在大分子自组装中产生目标形态方面,优于使用手工制作的代理模型和进化算法的主动学习方案。利用预训练的语言模型可以减少对超参数调整的需求,同时提供新的功能,如自我反射。无论是否有任务本身的上下文,模型都能有效地完成这项任务,但特定领域的上下文有时会使模型更快地收敛到良好的解决方案。此外,在没有特定语境的情况下,模型会推断出任务的近似概念(例如,称其为蛋白质折叠问题)。这项工作证明了 Claude 3.5 作为进化优化器的能力(这是最近发现的大型语言模型的新兴行为),并展示了软材料研究和设计中的一个实际应用案例。
期刊介绍:
npj Computational Materials is a high-quality open access journal from Nature Research that publishes research papers applying computational approaches for the design of new materials and enhancing our understanding of existing ones. The journal also welcomes papers on new computational techniques and the refinement of current approaches that support these aims, as well as experimental papers that complement computational findings.
Some key features of npj Computational Materials include a 2-year impact factor of 12.241 (2021), article downloads of 1,138,590 (2021), and a fast turnaround time of 11 days from submission to the first editorial decision. The journal is indexed in various databases and services, including Chemical Abstracts Service (ACS), Astrophysics Data System (ADS), Current Contents/Physical, Chemical and Earth Sciences, Journal Citation Reports/Science Edition, SCOPUS, EI Compendex, INSPEC, Google Scholar, SCImago, DOAJ, CNKI, and Science Citation Index Expanded (SCIE), among others.