首页 > 最新文献

Digital discovery最新文献

英文 中文
ProtAgents: protein discovery via large language model multi-agent collaborations combining physics and machine learning† ProtAgents:通过结合物理学和机器学习的大型语言模型多代理协作发现蛋白质
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-05-17 DOI: 10.1039/D4DD00013G
Alireza Ghafarollahi and Markus J. Buehler

Designing de novo proteins beyond those found in nature holds significant promise for advancements in both scientific and engineering applications. Current methodologies for protein design often rely on AI-based models, such as surrogate models that address end-to-end problems by linking protein structure to material properties or vice versa. However, these models frequently focus on specific material objectives or structural properties, limiting their flexibility when incorporating out-of-domain knowledge into the design process or comprehensive data analysis is required. In this study, we introduce ProtAgents, a platform for de novo protein design based on Large Language Models (LLMs), where multiple AI agents with distinct capabilities collaboratively address complex tasks within a dynamic environment. The versatility in agent development allows for expertise in diverse domains, including knowledge retrieval, protein structure analysis, physics-based simulations, and results analysis. The dynamic collaboration between agents, empowered by LLMs, provides a versatile approach to tackling protein design and analysis problems, as demonstrated through diverse examples in this study. The problems of interest encompass designing new proteins, analyzing protein structures and obtaining new first-principles data – natural vibrational frequencies – via physics simulations. The concerted effort of the system allows for powerful automated and synergistic design of de novo proteins with targeted mechanical properties. The flexibility in designing the agents, on one hand, and their capacity in autonomous collaboration through the dynamic LLM-based multi-agent environment on the other hand, unleashes great potentials of LLMs in addressing multi-objective materials problems and opens up new avenues for autonomous materials discovery and design.

设计超越自然界的蛋白质为科学和工程应用领域的进步带来了巨大的希望。目前的蛋白质设计方法通常依赖于基于人工智能的模型,例如通过将蛋白质结构与材料属性或反向连接来解决端到端问题的代用模型。然而,这些模型通常只关注特定的材料目标或结构特性,在将领域外知识纳入设计过程或需要综合数据分析时,其灵活性受到限制。在这项研究中,我们介绍了基于大型语言模型(LLMs)的蛋白质设计平台--ProtAgents,在这个平台上,具有不同能力的多个人工智能代理可以在动态环境中协同完成复杂的任务。代理开发的多功能性使其具备了不同领域的专业知识,包括知识检索、蛋白质结构分析、物理模拟和结果分析。正如本研究中的各种示例所证明的那样,由 LLMs 驱动的代理之间的动态协作为解决蛋白质设计和分析问题提供了一种多用途方法。我们感兴趣的问题包括设计新蛋白质、分析蛋白质结构以及通过物理模拟获得新的第一原理数据--自然振动频率。通过该系统的协同努力,可以自动协同设计出具有目标机械特性的蛋白质。通过基于LLM的动态多代理环境,一方面可以灵活设计代理,另一方面可以实现代理间的自主协作,从而释放LLM在解决多目标材料问题方面的巨大潜力,为自主材料发现和设计开辟了新途径。
{"title":"ProtAgents: protein discovery via large language model multi-agent collaborations combining physics and machine learning†","authors":"Alireza Ghafarollahi and Markus J. Buehler","doi":"10.1039/D4DD00013G","DOIUrl":"10.1039/D4DD00013G","url":null,"abstract":"<p >Designing <em>de novo</em> proteins beyond those found in nature holds significant promise for advancements in both scientific and engineering applications. Current methodologies for protein design often rely on AI-based models, such as surrogate models that address end-to-end problems by linking protein structure to material properties or <em>vice versa</em>. However, these models frequently focus on specific material objectives or structural properties, limiting their flexibility when incorporating out-of-domain knowledge into the design process or comprehensive data analysis is required. In this study, we introduce ProtAgents, a platform for <em>de novo</em> protein design based on Large Language Models (LLMs), where multiple AI agents with distinct capabilities collaboratively address complex tasks within a dynamic environment. The versatility in agent development allows for expertise in diverse domains, including knowledge retrieval, protein structure analysis, physics-based simulations, and results analysis. The dynamic collaboration between agents, empowered by LLMs, provides a versatile approach to tackling protein design and analysis problems, as demonstrated through diverse examples in this study. The problems of interest encompass designing new proteins, analyzing protein structures and obtaining new first-principles data – natural vibrational frequencies – <em>via</em> physics simulations. The concerted effort of the system allows for powerful automated and synergistic design of <em>de novo</em> proteins with targeted mechanical properties. The flexibility in designing the agents, on one hand, and their capacity in autonomous collaboration through the dynamic LLM-based multi-agent environment on the other hand, unleashes great potentials of LLMs in addressing multi-objective materials problems and opens up new avenues for autonomous materials discovery and design.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00013g?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141059333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mining patents with large language models elucidates the chemical function landscape† 利用大型语言模型挖掘专利,阐明化学功能格局
Pub Date : 2024-05-07 DOI: 10.1039/D4DD00011K
Clayton W. Kosonocky, Claus O. Wilke, Edward M. Marcotte and Andrew D. Ellington

The fundamental goal of small molecule discovery is to generate chemicals with target functionality. While this often proceeds through structure-based methods, we set out to investigate the practicality of methods that leverage the extensive corpus of chemical literature. We hypothesize that a sufficiently large text-derived chemical function dataset would mirror the actual landscape of chemical functionality. Such a landscape would implicitly capture complex physical and biological interactions given that chemical function arises from both a molecule's structure and its interacting partners. To evaluate this hypothesis, we built a Chemical Function (CheF) dataset of patent-derived functional labels. This dataset, comprising 631 K molecule–function pairs, was created using an LLM- and embedding-based method to obtain 1.5 K unique functional labels for approximately 100 K randomly selected molecules from their corresponding 188 K unique patents. We carry out a series of analyses demonstrating that the CheF dataset contains a semantically coherent textual representation of the functional landscape congruent with chemical structural relationships, thus approximating the actual chemical function landscape. We then demonstrate through several examples that this text-based functional landscape can be leveraged to identify drugs with target functionality using a model able to predict functional profiles from structure alone. We believe that functional label-guided molecular discovery may serve as an alternative approach to traditional structure-based methods in the pursuit of designing novel functional molecules.

小分子发现的基本目标是产生具有目标功能的化学物质。虽然这通常是通过基于结构的方法实现的,但我们着手研究利用大量化学文献的方法的实用性。我们假设,一个足够大的文本化学功能数据集将反映化学功能的实际情况。鉴于化学功能源于分子的结构及其相互作用的伙伴,这样的景观将隐含地捕捉到复杂的物理和生物相互作用。为了评估这一假设,我们建立了一个化学功能(CheF)数据集,其中包含来自专利的功能标签。该数据集由 631K 个分子-功能对组成,采用基于 LLM 和嵌入的方法创建,从相应的 188K 个唯一专利中随机选取约 100K 个分子,获得 1.5K 个唯一的功能标签。我们进行的一系列分析表明,CheF 数据集包含与化学结构关系一致的功能图谱的语义连贯文本表示,因此近似于实际的化学功能图谱。然后,我们通过几个例子证明,可以利用这种基于文本的功能图谱,通过一个能够仅从结构预测功能概况的模型来识别具有靶向功能的药物。我们相信,在设计新型功能分子的过程中,功能标签引导的分子发现可以作为传统的基于结构方法的替代方法。
{"title":"Mining patents with large language models elucidates the chemical function landscape†","authors":"Clayton W. Kosonocky, Claus O. Wilke, Edward M. Marcotte and Andrew D. Ellington","doi":"10.1039/D4DD00011K","DOIUrl":"10.1039/D4DD00011K","url":null,"abstract":"<p >The fundamental goal of small molecule discovery is to generate chemicals with target functionality. While this often proceeds through structure-based methods, we set out to investigate the practicality of methods that leverage the extensive corpus of chemical literature. We hypothesize that a sufficiently large text-derived chemical function dataset would mirror the actual landscape of chemical functionality. Such a landscape would implicitly capture complex physical and biological interactions given that chemical function arises from both a molecule's structure and its interacting partners. To evaluate this hypothesis, we built a Chemical Function (CheF) dataset of patent-derived functional labels. This dataset, comprising 631 K molecule–function pairs, was created using an LLM- and embedding-based method to obtain 1.5 K unique functional labels for approximately 100 K randomly selected molecules from their corresponding 188 K unique patents. We carry out a series of analyses demonstrating that the CheF dataset contains a semantically coherent textual representation of the functional landscape congruent with chemical structural relationships, thus approximating the actual chemical function landscape. We then demonstrate through several examples that this text-based functional landscape can be leveraged to identify drugs with target functionality using a model able to predict functional profiles from structure alone. We believe that functional label-guided molecular discovery may serve as an alternative approach to traditional structure-based methods in the pursuit of designing novel functional molecules.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00011k?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140881812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
iSIM: instant similarity† iSIM:即时相似性
Pub Date : 2024-05-07 DOI: 10.1039/D4DD00041B
Kenneth López-Pérez, Taewon D. Kim and Ramón Alain Miranda-Quintana

The quantification of molecular similarity has been present since the beginning of cheminformatics. Although several similarity indices and molecular representations have been reported, all of them ultimately reduce to the calculation of molecular similarities of only two objects at a time. Hence, to obtain the average similarity of a set of molecules, all the pairwise comparisons need to be computed, which demands a quadratic scaling in the number of computational resources. Here we propose an exact alternative to this problem: iSIM (instant similarity). iSIM performs comparisons of multiple molecules at the same time and yields the same value as the average pairwise comparisons of molecules represented by binary fingerprints and real-value descriptors. In this work, we introduce the mathematical framework and several applications of iSIM in chemical sampling, visualization, diversity selection, and clustering.

自化学信息学诞生之初,分子相似性的量化问题就一直存在。尽管已经报道了多种相似性指数和分子表示方法,但所有这些方法最终都只能一次计算两个对象的分子相似性。因此,要得到一组分子的平均相似性,就需要计算所有成对比较,这就要求计算资源的数量按二次方缩放。iSIM 可同时对多个分子进行比较,并得出与用二进制指纹和实值描述符表示的分子成对比较平均值相同的值。在这项工作中,我们将介绍 iSIM 的数学框架以及在化学取样、可视化、多样性选择和聚类方面的若干应用。
{"title":"iSIM: instant similarity†","authors":"Kenneth López-Pérez, Taewon D. Kim and Ramón Alain Miranda-Quintana","doi":"10.1039/D4DD00041B","DOIUrl":"10.1039/D4DD00041B","url":null,"abstract":"<p >The quantification of molecular similarity has been present since the beginning of cheminformatics. Although several similarity indices and molecular representations have been reported, all of them ultimately reduce to the calculation of molecular similarities of only two objects at a time. Hence, to obtain the average similarity of a set of molecules, all the pairwise comparisons need to be computed, which demands a quadratic scaling in the number of computational resources. Here we propose an exact alternative to this problem: iSIM (instant similarity). iSIM performs comparisons of multiple molecules at the same time and yields the same value as the average pairwise comparisons of molecules represented by binary fingerprints and real-value descriptors. In this work, we introduce the mathematical framework and several applications of iSIM in chemical sampling, visualization, diversity selection, and clustering.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00041b?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140881815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction: Predicting small molecules solubility on endpoint devices using deep ensemble neural networks 更正:利用深度集合神经网络预测小分子在终端设备上的溶解度
Pub Date : 2024-05-03 DOI: 10.1039/D4DD90020K
Mayk Caldas Ramos and Andrew D. White

Correction for ‘Predicting small molecules solubility on endpoint devices using deep ensemble neural networks’ by Mayk Caldas Ramos and Andrew D. White, Digital Discovery, 2024, 3, 786–795, https://doi.org/10.1039/D3DD00217A.

对 Mayk Caldas Ramos 和 Andrew D. White 的 "使用深度集合神经网络预测小分子在终端设备上的溶解度 "的更正,《数字发现》,2024 年 3 期,786-795,https://doi.org/10.1039/D3DD00217A。
{"title":"Correction: Predicting small molecules solubility on endpoint devices using deep ensemble neural networks","authors":"Mayk Caldas Ramos and Andrew D. White","doi":"10.1039/D4DD90020K","DOIUrl":"10.1039/D4DD90020K","url":null,"abstract":"<p >Correction for ‘Predicting small molecules solubility on endpoint devices using deep ensemble neural networks’ by Mayk Caldas Ramos and Andrew D. White, <em>Digital Discovery</em>, 2024, <strong>3</strong>, 786–795, https://doi.org/10.1039/D3DD00217A.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd90020k?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140838554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A reproducibility study of atomistic line graph neural networks for materials property prediction† 用于材料性能预测的原子线图神经网络重现性研究
Pub Date : 2024-04-30 DOI: 10.1039/D4DD00064A
Kangming Li, Brian DeCost, Kamal Choudhary and Jason Hattrick-Simpers

Use of machine learning has been increasingly popular in materials science as data-driven materials discovery is becoming the new paradigm. Reproducibility of findings is paramount for promoting transparency and accountability in research and building trust in the scientific community. Here we conduct a reproducibility analysis of the work by K. Choudhary and B. Brian [npj Comput. Mater., 7, 2021, 185], in which a new graph neural network architecture was developed with improved performance on multiple atomistic prediction tasks. We examine the reproducibility for the model performance on 29 regression tasks and for an ablation analysis of the graph neural network layers. We find that the reproduced results generally exhibit a good quantitative agreement with the initial study, despite minor disparities in model performance and training efficiency that may be resulting from factors such as hardware difference and stochasticity involved in model training and data splits. The ease of conducting these reproducibility experiments confirms the great benefits of open data and code practices to which the initial work adhered. We also discuss some further enhancements in reproducible practices such as code and data archiving and providing data identifiers used in dataset splits.

随着数据驱动的材料发现正在成为新的范式,机器学习的使用在材料科学领域日益流行。研究结果的可重复性对于促进研究的透明度和问责制以及建立科学界的信任至关重要。在此,我们对 K. Choudhary 和 B. Brian [npj Comput. Mater., 7, 2021, 185]的研究成果进行了可重复性分析,该研究开发了一种新的图神经网络架构,提高了多种原子预测任务的性能。我们研究了 29 项回归任务中模型性能的再现性,以及图神经网络层的消融分析。我们发现,尽管在模型性能和训练效率方面可能会因硬件差异、模型训练中的随机性以及数据分割等因素而存在细微差别,但重现的结果总体上与最初的研究在数量上表现出良好的一致性。这些可重复性实验的轻松进行证实了最初工作所坚持的开放数据和代码实践的巨大好处。我们还讨论了可重复性实践中的一些进一步改进,如代码和数据归档以及提供数据集拆分中使用的数据标识符。
{"title":"A reproducibility study of atomistic line graph neural networks for materials property prediction†","authors":"Kangming Li, Brian DeCost, Kamal Choudhary and Jason Hattrick-Simpers","doi":"10.1039/D4DD00064A","DOIUrl":"10.1039/D4DD00064A","url":null,"abstract":"<p >Use of machine learning has been increasingly popular in materials science as data-driven materials discovery is becoming the new paradigm. Reproducibility of findings is paramount for promoting transparency and accountability in research and building trust in the scientific community. Here we conduct a reproducibility analysis of the work by K. Choudhary and B. Brian [<em>npj Comput. Mater.</em>, <strong>7</strong>, 2021, 185], in which a new graph neural network architecture was developed with improved performance on multiple atomistic prediction tasks. We examine the reproducibility for the model performance on 29 regression tasks and for an ablation analysis of the graph neural network layers. We find that the reproduced results generally exhibit a good quantitative agreement with the initial study, despite minor disparities in model performance and training efficiency that may be resulting from factors such as hardware difference and stochasticity involved in model training and data splits. The ease of conducting these reproducibility experiments confirms the great benefits of open data and code practices to which the initial work adhered. We also discuss some further enhancements in reproducible practices such as code and data archiving and providing data identifiers used in dataset splits.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00064a?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140838553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accelerated screening of gas diffusion electrodes for carbon dioxide reduction† 加速筛选用于二氧化碳还原的气体扩散电极
Pub Date : 2024-04-30 DOI: 10.1039/D4DD00061G
Ryan J. R. Jones, Yungchieh Lai, Dan Guevarra, Kevin Kan, Joel A. Haber and John M. Gregoire

The electrochemical conversion of carbon dioxide to chemicals and fuels is expected to be a key sustainability technology. Electrochemical carbon dioxide reduction technologies are challenged by several factors, including the limited solubility of carbon dioxide in aqueous electrolyte as well as the difficulty in utilizing polymer electrolytes. These considerations have driven system designs to incorporate gas diffusion electrodes (GDEs) to bring the electrocatalyst in contact with both a gaseous reactant/product stream as well as a liquid electrolyte. GDE optimization typically results from manual tuning by select experts. Automated preparation and operation of GDE cells could be a watershed for the systematic study of, and ultimately the development of a materials acceleration platform (MAP) for, catalyst discovery and system optimization. Toward this end, we present the automated GDE (AutoGDE) testing system. Given a catalyst-coated GDE, AutoGDE automates the insertion of the GDE into an electrochemical cell, the liquid and gas handling, the quantification of gaseous reaction products via online mass spectroscopy, and the archiving of the liquid electrolyte for subsequent analysis.

电化学将二氧化碳转化为化学品和燃料有望成为一项关键的可持续发展技术。电化学二氧化碳还原技术面临着多种因素的挑战,包括二氧化碳在水性电解质中的溶解度有限以及难以使用聚合物电解质。这些因素促使系统设计采用气体扩散电极 (GDE),使电催化剂同时与气态反应物/产物流和液态电解质接触。气体扩散电极的优化通常是由选定的专家进行手动调整。GDE 单元的自动制备和操作可以成为系统研究的分水岭,并最终开发出用于催化剂发现和系统优化的材料加速平台 (MAP)。为此,我们推出了自动 GDE(AutoGDE)测试系统。给定一个催化剂涂层 GDE,AutoGDE 可自动将 GDE 插入电化学电池、处理液体和气体、通过在线质谱对气态反应产物进行定量,以及将液体电解质存档以备后续分析。
{"title":"Accelerated screening of gas diffusion electrodes for carbon dioxide reduction†","authors":"Ryan J. R. Jones, Yungchieh Lai, Dan Guevarra, Kevin Kan, Joel A. Haber and John M. Gregoire","doi":"10.1039/D4DD00061G","DOIUrl":"10.1039/D4DD00061G","url":null,"abstract":"<p >The electrochemical conversion of carbon dioxide to chemicals and fuels is expected to be a key sustainability technology. Electrochemical carbon dioxide reduction technologies are challenged by several factors, including the limited solubility of carbon dioxide in aqueous electrolyte as well as the difficulty in utilizing polymer electrolytes. These considerations have driven system designs to incorporate gas diffusion electrodes (GDEs) to bring the electrocatalyst in contact with both a gaseous reactant/product stream as well as a liquid electrolyte. GDE optimization typically results from manual tuning by select experts. Automated preparation and operation of GDE cells could be a watershed for the systematic study of, and ultimately the development of a materials acceleration platform (MAP) for, catalyst discovery and system optimization. Toward this end, we present the automated GDE (AutoGDE) testing system. Given a catalyst-coated GDE, AutoGDE automates the insertion of the GDE into an electrochemical cell, the liquid and gas handling, the quantification of gaseous reaction products <em>via</em> online mass spectroscopy, and the archiving of the liquid electrolyte for subsequent analysis.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00061g?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140838305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards equilibrium molecular conformation generation with GFlowNets† 利用 GFlowNets 生成平衡分子构象
Pub Date : 2024-04-29 DOI: 10.1039/D4DD00023D
Alexandra Volokhova, Michał Koziarski, Alex Hernández-García, Cheng-Hao Liu, Santiago Miret, Pablo Lemos, Luca Thiede, Zichao Yan, Alán Aspuru-Guzik and Yoshua Bengio

Sampling diverse, thermodynamically feasible molecular conformations plays a crucial role in predicting properties of a molecule. In this paper we propose to use GFlowNets for sampling conformations of small molecules from the Boltzmann distribution, as determined by the molecule's energy. The proposed approach can be used in combination with energy estimation methods of different fidelity and discovers a diverse set of low-energy conformations for drug-like molecules. We demonstrate that GFlowNets can reproduce molecular potential energy surfaces by sampling proportionally to the Boltzmann distribution.

对各种热力学上可行的分子构象进行采样在预测分子性质方面起着至关重要的作用。在本文中,我们建议使用 GFlowNets 从波尔兹曼分布(由分子能量决定)中对小分子构象进行采样。所提出的方法可与不同保真度的能量估算方法结合使用,并为类药物分子发现多种低能构象。我们证明了 GFlowNets 可以通过按波尔兹曼分布比例采样来再现分子势能面。
{"title":"Towards equilibrium molecular conformation generation with GFlowNets†","authors":"Alexandra Volokhova, Michał Koziarski, Alex Hernández-García, Cheng-Hao Liu, Santiago Miret, Pablo Lemos, Luca Thiede, Zichao Yan, Alán Aspuru-Guzik and Yoshua Bengio","doi":"10.1039/D4DD00023D","DOIUrl":"10.1039/D4DD00023D","url":null,"abstract":"<p >Sampling diverse, thermodynamically feasible molecular conformations plays a crucial role in predicting properties of a molecule. In this paper we propose to use GFlowNets for sampling conformations of small molecules from the Boltzmann distribution, as determined by the molecule's energy. The proposed approach can be used in combination with energy estimation methods of different fidelity and discovers a diverse set of low-energy conformations for drug-like molecules. We demonstrate that GFlowNets can reproduce molecular potential energy surfaces by sampling proportionally to the Boltzmann distribution.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00023d?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140810659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A versatile optimization framework for porous electrode design† 多孔电极设计的多功能优化框架
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-04-25 DOI: 10.1039/D3DD00247K
Maxime van der Heijden, Gabor Szendrei, Victor de Haas and Antoni Forner-Cuenca

Porous electrodes are performance-defining components in electrochemical devices, such as redox flow batteries, as they govern the electrochemical performance and pumping demands of the reactor. Yet, conventional porous electrodes used in redox flow batteries are not tailored to sustain convection-enhanced electrochemical reactions. Thus, there is a need for electrode optimization to enhance the system performance. In this work, we present an optimization framework to carry out the bottom-up design of porous electrodes by coupling a genetic algorithm with a pore network modeling framework. We introduce geometrical versatility by adding a pore merging and splitting function, study the impact of various optimization parameters, geometrical definitions, and objective functions, and incorporate conventional electrode and flow field designs. Moreover, we show the need for optimizing geometries for specific reactor architectures and operating conditions to design next-generation electrodes, by analyzing the genetic algorithm optimization for initial starting geometries with diverse morphologies (cubic and a tomography-extracted commercial electrode), flow field designs (flow-through and interdigitated), and redox chemistries (VO2+/VO2+ and TEMPO/TEMPO+). We found that for kinetically sluggish electrolytes with high ionic conductivity, electrodes with numerous small pores and high internal surface area provide enhanced performance, whereas for kinetically facile electrolytes with low ionic conductivity, low through-plane tortuosity and high hydraulic conductance are desired. The computational tool developed in this work can further expanded to the design of high-performance electrode materials for a broad range of operating conditions, electrolyte chemistries, reactor designs, and electrochemical technologies.

多孔电极是氧化还原液流电池等电化学装置中决定性能的部件,因为它们控制着反应器的电化学性能和泵送要求。然而,氧化还原液流电池中使用的传统多孔电极并不适合维持对流增强型电化学反应。因此,有必要对电极进行优化,以提高系统性能。在这项工作中,我们提出了一个优化框架,通过将遗传算法与孔隙网络建模框架相结合,对多孔电极进行自下而上的设计。我们通过添加孔隙合并和分裂功能引入了几何多功能性,研究了各种优化参数、几何定义和目标函数的影响,并结合了具有明确几何定义的电极结构和流场。此外,我们还通过分析具有不同形态(立方体和断层扫描提取的商用电极)、流场设计(流经式和交错式)和氧化还原化学(VO2+/VO2+ 和 TEMPO/TEMPO+)的初始几何形状的遗传算法优化,说明了针对特定反应器结构和操作条件优化电极以设计下一代电极的必要性。我们发现,对于具有高离子电导率的动力迟钝型电解质,具有大量小孔和高内表面积的电极可提高其性能;而对于具有低离子电导率的动力促进型电解质,则需要低通透面曲折度和高水力传导。本研究开发的计算工具可指导高性能电极材料的设计,适用于各种操作条件、电解质化学成分、反应器设计和电化学技术。
{"title":"A versatile optimization framework for porous electrode design†","authors":"Maxime van der Heijden, Gabor Szendrei, Victor de Haas and Antoni Forner-Cuenca","doi":"10.1039/D3DD00247K","DOIUrl":"10.1039/D3DD00247K","url":null,"abstract":"<p >Porous electrodes are performance-defining components in electrochemical devices, such as redox flow batteries, as they govern the electrochemical performance and pumping demands of the reactor. Yet, conventional porous electrodes used in redox flow batteries are not tailored to sustain convection-enhanced electrochemical reactions. Thus, there is a need for electrode optimization to enhance the system performance. In this work, we present an optimization framework to carry out the bottom-up design of porous electrodes by coupling a genetic algorithm with a pore network modeling framework. We introduce geometrical versatility by adding a pore merging and splitting function, study the impact of various optimization parameters, geometrical definitions, and objective functions, and incorporate conventional electrode and flow field designs. Moreover, we show the need for optimizing geometries for specific reactor architectures and operating conditions to design next-generation electrodes, by analyzing the genetic algorithm optimization for initial starting geometries with diverse morphologies (cubic and a tomography-extracted commercial electrode), flow field designs (flow-through and interdigitated), and redox chemistries (VO<small><sup>2+</sup></small>/VO<small><sub>2</sub></small><small><sup>+</sup></small> and TEMPO/TEMPO<small><sup>+</sup></small>). We found that for kinetically sluggish electrolytes with high ionic conductivity, electrodes with numerous small pores and high internal surface area provide enhanced performance, whereas for kinetically facile electrolytes with low ionic conductivity, low through-plane tortuosity and high hydraulic conductance are desired. The computational tool developed in this work can further expanded to the design of high-performance electrode materials for a broad range of operating conditions, electrolyte chemistries, reactor designs, and electrochemical technologies.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d3dd00247k?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140803034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Molecular graph transformer: stepping beyond ALIGNN into long-range interactions† 分子图转换器:超越 ALIGNN,进入远距离相互作用
Pub Date : 2024-04-23 DOI: 10.1039/D4DD00014E
Marco Anselmi, Greg Slabaugh, Rachel Crespo-Otero and Devis Di Tommaso

Graph Neural Networks (GNNs) have revolutionized material property prediction by learning directly from the structural information of molecules and materials. However, conventional GNN models rely solely on local atomic interactions, such as bond lengths and angles, neglecting crucial long-range electrostatic forces that affect certain properties. To address this, we introduce the Molecular Graph Transformer (MGT), a novel GNN architecture that combines local attention mechanisms with message passing on both bond graphs and their line graphs, explicitly capturing long-range interactions. Benchmarking on MatBench and Quantum MOF (QMOF) datasets demonstrates that MGT's improved understanding of electrostatic interactions significantly enhances the prediction accuracy of properties like exfoliation energy and refractive index, while maintaining state-of-the-art performance on all other properties. This breakthrough paves the way for the development of highly accurate and efficient materials design tools across diverse applications.

图神经网络(GNN)通过直接学习分子和材料的结构信息,为材料特性预测带来了革命性的变化。然而,传统的 GNN 模型仅依赖于局部原子相互作用,例如键长和键角,而忽略了影响某些性质的关键长程静电力。为了解决这个问题,我们引入了分子图转换器(MGT),这是一种新颖的 GNN 架构,它将局部关注机制与键图及其线图上的信息传递相结合,明确捕捉了长程相互作用。在 MatBench 和量子 MOF (QMOF) 数据集上进行的基准测试表明,MGT 对静电相互作用的理解得到了改进,从而显著提高了对剥离能和折射率等性质的预测精度,同时在所有其他性质上保持了最先进的性能。这一突破为在各种应用领域开发高精度、高效率的材料设计工具铺平了道路。
{"title":"Molecular graph transformer: stepping beyond ALIGNN into long-range interactions†","authors":"Marco Anselmi, Greg Slabaugh, Rachel Crespo-Otero and Devis Di Tommaso","doi":"10.1039/D4DD00014E","DOIUrl":"10.1039/D4DD00014E","url":null,"abstract":"<p >Graph Neural Networks (GNNs) have revolutionized material property prediction by learning directly from the structural information of molecules and materials. However, conventional GNN models rely solely on local atomic interactions, such as bond lengths and angles, neglecting crucial long-range electrostatic forces that affect certain properties. To address this, we introduce the Molecular Graph Transformer (MGT), a novel GNN architecture that combines local attention mechanisms with message passing on both bond graphs and their line graphs, explicitly capturing long-range interactions. Benchmarking on MatBench and Quantum MOF (QMOF) datasets demonstrates that MGT's improved understanding of electrostatic interactions significantly enhances the prediction accuracy of properties like exfoliation energy and refractive index, while maintaining state-of-the-art performance on all other properties. This breakthrough paves the way for the development of highly accurate and efficient materials design tools across diverse applications.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00014e?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140803032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning peptide properties with positive examples only 只用正面例子学习多肽特性
Pub Date : 2024-04-19 DOI: 10.1039/D3DD00218G
Mehrad Ansari and Andrew D. White

Deep learning can create accurate predictive models by exploiting existing large-scale experimental data, and guide the design of molecules. However, a major barrier is the requirement of both positive and negative examples in the classical supervised learning frameworks. Notably, most peptide databases come with missing information and low number of observations on negative examples, as such sequences are hard to obtain using high-throughput screening methods. To address this challenge, we solely exploit the limited known positive examples in a semi-supervised setting, and discover peptide sequences that are likely to map to certain antimicrobial properties via positive-unlabeled learning (PU). In particular, we use the two learning strategies of adapting base classifier and reliable negative identification to build deep learning models for inferring solubility, hemolysis, binding against SHP-2, and non-fouling activity of peptides, given their sequence. We evaluate the predictive performance of our PU learning method and show that by only using the positive data, it can achieve competitive performance when compared with the classical positive–negative (PN) classification approach, where there is access to both positive and negative examples.

深度学习可以利用现有的大规模实验数据创建精确的预测模型,并指导分子设计。然而,在经典的监督学习框架中,一个主要障碍是需要正反两方面的实例。值得注意的是,大多数肽数据库都存在信息缺失的问题,而且负面示例的观测数据较少,因为使用高通量筛选方法很难获得这类序列。为了应对这一挑战,我们在半监督设置中仅利用有限的已知正向示例,通过正向无标记学习(PU)发现可能映射到某些抗菌特性的肽序列。特别是,我们使用适应基础分类器和可靠的负识别这两种学习策略来建立深度学习模型,以便根据肽的序列推断其溶解度、溶血、与SHP-2的结合力和无污活性。我们对我们的 PU 学习方法的预测性能进行了评估,结果表明,与经典的正负(PN)分类方法相比,我们的 PU 学习方法仅使用正向数据,就能获得具有竞争力的性能,因为在正向和负向实例中都能获得正向数据。
{"title":"Learning peptide properties with positive examples only","authors":"Mehrad Ansari and Andrew D. White","doi":"10.1039/D3DD00218G","DOIUrl":"10.1039/D3DD00218G","url":null,"abstract":"<p >Deep learning can create accurate predictive models by exploiting existing large-scale experimental data, and guide the design of molecules. However, a major barrier is the requirement of both positive and negative examples in the classical supervised learning frameworks. Notably, most peptide databases come with missing information and low number of observations on negative examples, as such sequences are hard to obtain using high-throughput screening methods. To address this challenge, we solely exploit the limited known positive examples in a semi-supervised setting, and discover peptide sequences that are likely to map to certain antimicrobial properties <em>via</em> positive-unlabeled learning (PU). In particular, we use the two learning strategies of adapting base classifier and reliable negative identification to build deep learning models for inferring solubility, hemolysis, binding against SHP-2, and non-fouling activity of peptides, given their sequence. We evaluate the predictive performance of our PU learning method and show that by only using the positive data, it can achieve competitive performance when compared with the classical positive–negative (PN) classification approach, where there is access to both positive and negative examples.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d3dd00218g?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140629401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Digital discovery
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1