首页 > 最新文献

Digital discovery最新文献

英文 中文
OM-Diff: inverse-design of organometallic catalysts with guided equivariant denoising diffusion† OM-Diff:利用引导等变量去噪扩散反向设计有机金属催化剂
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-07-23 DOI: 10.1039/D4DD00099D
François Cornet, Bardi Benediktsson, Bjarke Hastrup, Mikkel N. Schmidt and Arghya Bhowmik

Organometallic complexes are ubiquitous in numerous technological applications, and in particular in homogeneous catalysis. Optimization of such complexes for specific applications is challenging due to the large variety of possible metal–ligand combinations and ligand–ligand interactions. Here we present OM-Diff, an inverse-design framework based on a diffusion generative model for in silico design of such complexes. Due to the importance of the spatial structure of a catalyst, the model operates on all-atom (including H) representations in 3D space. To handle the symmetries inherent to that data representation, OM-Diff combines an equivariant diffusion model with an equivariant property predictor. The diffusion model generates ligands conditioned on a specified metal-center, while the property predictor guides the generation towards novel complexes with desired properties. We demonstrate the potential of OM-Diff by designing optimized catalysts for a family of cross-coupling reactions, and validating a selection of novel proposed compounds with DFT calculations.

有机金属配合物在均相催化和其他技术应用中无处不在。由于可能的金属配体组合和配体与配体之间的相互作用种类繁多,因此针对特定应用优化此类配合物极具挑战性。在此,我们提出了基于扩散生成模型的反向设计框架 OM-Diff,用于从头开始对此类复合物进行室内设计。鉴于催化剂空间结构的重要性,该模型直接在 3$D 空间的全原子(包括氢)表征上运行。为了处理该数据表示固有的对称性,OM-Diff 结合了等变扩散模型和等变性质预测器,以便在推理时驱动采样。该模型可以有条件地生成训练数据集之外的新型配体。我们通过设计一系列交叉耦合反应的催化剂,并通过 DFT 计算验证所提出的新化合物,证明了所提出方法的潜力。
{"title":"OM-Diff: inverse-design of organometallic catalysts with guided equivariant denoising diffusion†","authors":"François Cornet, Bardi Benediktsson, Bjarke Hastrup, Mikkel N. Schmidt and Arghya Bhowmik","doi":"10.1039/D4DD00099D","DOIUrl":"10.1039/D4DD00099D","url":null,"abstract":"<p >Organometallic complexes are ubiquitous in numerous technological applications, and in particular in homogeneous catalysis. Optimization of such complexes for specific applications is challenging due to the large variety of possible metal–ligand combinations and ligand–ligand interactions. Here we present OM-Diff, an inverse-design framework based on a diffusion generative model for <em>in silico</em> design of such complexes. Due to the importance of the spatial structure of a catalyst, the model operates on all-atom (including H) representations in 3D space. To handle the symmetries inherent to that data representation, OM-Diff combines an equivariant diffusion model with an equivariant property predictor. The diffusion model generates ligands conditioned on a specified metal-center, while the property predictor guides the generation towards novel complexes with desired properties. We demonstrate the potential of OM-Diff by designing optimized catalysts for a family of cross-coupling reactions, and validating a selection of novel proposed compounds with DFT calculations.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00099d?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141753917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
What can attribution methods show us about chemical language models?†‡ 归因方法能向我们展示哪些化学语言模型?
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-07-18 DOI: 10.1039/D4DD00084F
Stefan Hödl, Tal Kachman, Yoram Bachrach, Wilhelm T. S. Huck and William E. Robinson

Language models trained on molecular string representations have shown strong performance in predictive and generative tasks. However, practical applications require not only making accurate predictions, but also explainability – the ability to explain the reasons and rationale behind the predictions. In this work, we explore explainability for a chemical language model by adapting a transformer-specific and a model-agnostic input attribution technique. We fine-tune a pretrained model to predict aqueous solubility, compare training and architecture variants, and evaluate visualizations of attributed relevance. The model-agnostic SHAP technique provides sensible attributions, highlighting the positive influence of individual electronegative atoms, but does not explain the model in terms of functional groups or explain how the model represents molecular strings internally to make predictions. In contrast, the adapted transformer-specific explainability technique produces sparse attributions, which cannot be directly attributed to functional groups relevant to solubility. Instead, the attributions are more characteristic of how the model maps molecular strings to its latent space, which seems to represent features relevant to molecular similarity rather than functional groups. These findings provide insight into the representations underpinning chemical language models, which we propose may be leveraged for the design of informative chemical spaces for training more accurate, advanced and explainable models.

根据分子字符串表征训练的语言模型在预测和生成任务中表现出色。然而,实际应用不仅需要准确的预测,还需要可解释性--能够解释预测背后的原因和原理。在这项工作中,我们通过调整特定于变换器的输入归因技术和与模型无关的输入归因技术,探索了化学语言模型的可解释性。我们对预测水溶性的预训练模型进行了微调,比较了训练和架构变体,并对归因相关性的可视化进行了评估。与模型无关的 SHAP 技术获得了合理的归因,突出了单个电负性原子的积极影响,但没有从官能团的角度解释模型,也没有解释模型如何在内部表示分子串以进行预测。与此相反,经过改良的 Transformer 特定可解释性技术产生了稀疏的归因,无法直接归因于与溶解度相关的官能团。相反,这些归因更能说明模型如何将分子串映射到其潜在空间,而潜在空间似乎代表了与分子相似性相关的特征,而非官能团。这些发现让我们深入了解了化学语言模型的基本表征,我们建议可以利用这些表征来设计信息丰富的化学空间,从而训练出更准确、更先进、更可解释的模型。
{"title":"What can attribution methods show us about chemical language models?†‡","authors":"Stefan Hödl, Tal Kachman, Yoram Bachrach, Wilhelm T. S. Huck and William E. Robinson","doi":"10.1039/D4DD00084F","DOIUrl":"10.1039/D4DD00084F","url":null,"abstract":"<p >Language models trained on molecular string representations have shown strong performance in predictive and generative tasks. However, practical applications require not only making accurate predictions, but also explainability – the ability to explain the reasons and rationale behind the predictions. In this work, we explore explainability for a chemical language model by adapting a transformer-specific and a model-agnostic input attribution technique. We fine-tune a pretrained model to predict aqueous solubility, compare training and architecture variants, and evaluate visualizations of attributed relevance. The model-agnostic SHAP technique provides sensible attributions, highlighting the positive influence of individual electronegative atoms, but does not explain the model in terms of functional groups or explain how the model represents molecular strings internally to make predictions. In contrast, the adapted transformer-specific explainability technique produces sparse attributions, which cannot be directly attributed to functional groups relevant to solubility. Instead, the attributions are more characteristic of how the model maps molecular strings to its latent space, which seems to represent features relevant to molecular similarity rather than functional groups. These findings provide insight into the representations underpinning chemical language models, which we propose may be leveraged for the design of informative chemical spaces for training more accurate, advanced and explainable models.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00084f?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141739622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Universal neural network potentials as descriptors: towards scalable chemical property prediction using quantum and classical computers 作为描述符的通用神经网络势:利用量子和经典计算机实现可扩展的化学性质预测
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-07-16 DOI: 10.1039/D4DD00098F
Tomoya Shiota, Kenji Ishihara and Wataru Mizukami

Accurate prediction of diverse chemical properties is crucial for advancing molecular design and materials discovery. Here we present a versatile approach that uses the intermediate information of a universal neural network potential as a general-purpose descriptor for chemical property prediction. Our method is based on the insight that by training a sophisticated neural network architecture for universal force fields, it learns transferable representations of atomic environments. We show that transfer learning with graph neural network potentials such as M3GNet and MACE achieves accuracy comparable to state-of-the-art methods for predicting the NMR chemical shifts by using quantum machine learning as well as a standard classical regression model, despite the compactness of its descriptors. In particular, the MACE descriptor demonstrates the highest accuracy to date on the 13C NMR chemical shift benchmarks for drug molecules. This work provides an efficient way to accurately predict properties, potentially accelerating the discovery of new molecules and materials.

准确预测各种化学特性对于推进分子设计和材料发现至关重要。在这里,我们提出了一种多功能方法,利用通用神经网络势的中间信息作为化学性质预测的通用描述符。我们的方法基于这样一种见解,即通过训练通用力场的复杂神经网络架构,它可以学习原子环境的可迁移表征。我们的研究表明,利用 M3GNet 和 MACE 等图神经网络潜能进行迁移学习,尽管其描述符非常紧凑,但在预测核磁共振化学位移方面,其准确性可与使用量子机器学习和标准经典回归模型的最先进方法相媲美。特别是,MACE 描述子在药物分子的 ${^{13}}$C NMR 化学位移基准上显示了迄今为止最高的准确度。这项工作提供了一种准确预测性质的有效方法,有可能加速新分子和新材料的发现。
{"title":"Universal neural network potentials as descriptors: towards scalable chemical property prediction using quantum and classical computers","authors":"Tomoya Shiota, Kenji Ishihara and Wataru Mizukami","doi":"10.1039/D4DD00098F","DOIUrl":"10.1039/D4DD00098F","url":null,"abstract":"<p >Accurate prediction of diverse chemical properties is crucial for advancing molecular design and materials discovery. Here we present a versatile approach that uses the intermediate information of a universal neural network potential as a general-purpose descriptor for chemical property prediction. Our method is based on the insight that by training a sophisticated neural network architecture for universal force fields, it learns transferable representations of atomic environments. We show that transfer learning with graph neural network potentials such as M3GNet and MACE achieves accuracy comparable to state-of-the-art methods for predicting the NMR chemical shifts by using quantum machine learning as well as a standard classical regression model, despite the compactness of its descriptors. In particular, the MACE descriptor demonstrates the highest accuracy to date on the <small><sup>13</sup></small>C NMR chemical shift benchmarks for drug molecules. This work provides an efficient way to accurately predict properties, potentially accelerating the discovery of new molecules and materials.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00098f?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141720025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Co-orchestration of multiple instruments to uncover structure–property relationships in combinatorial libraries† 联合协调多种仪器,揭示组合库的结构-性能关系
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-07-15 DOI: 10.1039/D4DD00109E
Boris N. Slautin, Utkarsh Pratiush, Ilia N. Ivanov, Yongtao Liu, Rohit Pant, Xiaohang Zhang, Ichiro Takeuchi, Maxim A. Ziatdinov and Sergei V. Kalinin

The rapid growth of automated and autonomous instrumentation brings forth opportunities for the co-orchestration of multimodal tools that are equipped with multiple sequential detection methods or several characterization techniques to explore identical samples. This is exemplified by combinatorial libraries that can be explored in multiple locations via multiple tools simultaneously or downstream characterization in automated synthesis systems. In co-orchestration approaches, information gained in one modality should accelerate the discovery of other modalities. Correspondingly, an orchestrating agent should select the measurement modality based on the anticipated knowledge gain and measurement cost. Herein, we propose and implement a co-orchestration approach for conducting measurements with complex observables, such as spectra or images. The method relies on combining dimensionality reduction by variational autoencoders with representation learning for control over the latent space structure and integration into an iterative workflow via multi-task Gaussian Processes (GPs). This approach further allows for the native incorporation of the system's physics via a probabilistic model as a mean function of the GPs. We illustrate this method for different modes of piezoresponse force microscopy and micro-Raman spectroscopy on a combinatorial Sm-BiFeO3 library. However, the proposed framework is general and can be extended to multiple measurement modalities and arbitrary dimensionality of the measured signals.

自动化和自主仪器的快速发展为多模态工具的共同协调带来了机遇,这些工具配备了多种连续检测方法或多种表征技术,可对相同的样品进行检测。例如,可以通过多个工具同时在多个位置探索组合库,或在自动合成系统中进行下游表征。在共同协调方法中,从一种模式中获得的信息应能加速其他模式的发现。相应地,协调代理应根据预期的知识收益和测量成本选择测量模式。在此,我们提出并实施了一种共同协调方法,用于对光谱或图像等复杂观测对象进行测量。该方法将变异自动编码器降维与表征学习相结合,以控制潜空间结构,并通过多任务高斯过程(GPs)集成到迭代工作流程中。这种方法还允许通过作为 GPs 平均函数的概率模型,将系统的物理特性融入其中。我们针对压电响应力显微镜和微拉曼光谱学的不同模式,对组合 Sm-BiFeO3 库进行了说明。不过,所提出的框架是通用的,可以扩展到多种测量模式和测量信号的任意维度。
{"title":"Co-orchestration of multiple instruments to uncover structure–property relationships in combinatorial libraries†","authors":"Boris N. Slautin, Utkarsh Pratiush, Ilia N. Ivanov, Yongtao Liu, Rohit Pant, Xiaohang Zhang, Ichiro Takeuchi, Maxim A. Ziatdinov and Sergei V. Kalinin","doi":"10.1039/D4DD00109E","DOIUrl":"10.1039/D4DD00109E","url":null,"abstract":"<p >The rapid growth of automated and autonomous instrumentation brings forth opportunities for the co-orchestration of multimodal tools that are equipped with multiple sequential detection methods or several characterization techniques to explore identical samples. This is exemplified by combinatorial libraries that can be explored in multiple locations <em>via</em> multiple tools simultaneously or downstream characterization in automated synthesis systems. In co-orchestration approaches, information gained in one modality should accelerate the discovery of other modalities. Correspondingly, an orchestrating agent should select the measurement modality based on the anticipated knowledge gain and measurement cost. Herein, we propose and implement a co-orchestration approach for conducting measurements with complex observables, such as spectra or images. The method relies on combining dimensionality reduction by variational autoencoders with representation learning for control over the latent space structure and integration into an iterative workflow <em>via</em> multi-task Gaussian Processes (GPs). This approach further allows for the native incorporation of the system's physics <em>via</em> a probabilistic model as a mean function of the GPs. We illustrate this method for different modes of piezoresponse force microscopy and micro-Raman spectroscopy on a combinatorial Sm-BiFeO<small><sub>3</sub></small> library. However, the proposed framework is general and can be extended to multiple measurement modalities and arbitrary dimensionality of the measured signals.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00109e?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141720026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated prediction of ground state spin for transition metal complexes† 过渡金属复合物基态自旋的自动预测
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-07-12 DOI: 10.1039/D4DD00093E
Yuri Cho, Ruben Laplaza, Sergi Vela and Clémence Corminboeuf

Exploiting crystallographic data repositories for large-scale quantum chemical computations requires the rapid and accurate extraction of the molecular structure, charge and spin from the crystallographic information file. Here, we develop a general approach to assign the ground state spin of transition metal complexes, in complement to our previous efforts on determining metal oxidation states and bond order within the cell2mol software. Starting from a database of 31k transition metal complexes extracted from the Cambridge Structural Database with cell2mol, we construct the TM-GSspin dataset, which contains 2063 mononuclear first row transition metal complexes and their computed ground state spins. TM-GSspin is highly diverse in terms of metals, metal oxidation states, coordination geometries, and coordination sphere compositions. Based on TM-GSspin, we identify correlations between structural and electronic features of the complexes and their ground state spins to develop a rule-based spin state assignment model. Leveraging this knowledge, we construct interpretable descriptors and build a statistical model achieving 98% cross-validated accuracy in predicting the ground state spin across the board. Our approach provides a practical way to determine the ground state spin of transition metal complexes directly from crystal structures without additional computations, thus enabling the automated use of crystallographic data for large-scale computations involving transition metal complexes.

利用晶体学数据资源库进行大规模量子化学计算,需要从晶体学信息文件中快速准确地提取分子结构、电荷和自旋。在此,我们开发了一种分配过渡金属复合物基态自旋的通用方法,以补充我们之前在 cell2mol 软件中确定金属氧化态和键序的工作。从利用 cell2mol 从剑桥结构数据库提取的 31K 个过渡金属配合物数据库开始,我们构建了 TM-GSspin 数据集,其中包含 2,063 个单核第一行过渡金属配合物及其计算出的基态自旋。TM-GSspin 在金属、金属氧化态、配位几何和配位层组成方面具有高度多样性。在 TM-GSspin 的基础上,我们确定了复合物的结构和电子特征与其基态自旋之间的相关性,从而开发出一种基于规则的自旋态分配模型。利用这些知识,我们构建了可解释的描述符,并建立了一个统计模型,其预测基态自旋的交叉验证准确率达到 98%。我们的方法提供了一种直接从晶体结构确定过渡金属复合物基态自旋的实用方法,无需额外计算,从而使晶体学数据能够自动用于涉及过渡金属复合物的大规模计算。
{"title":"Automated prediction of ground state spin for transition metal complexes†","authors":"Yuri Cho, Ruben Laplaza, Sergi Vela and Clémence Corminboeuf","doi":"10.1039/D4DD00093E","DOIUrl":"10.1039/D4DD00093E","url":null,"abstract":"<p >Exploiting crystallographic data repositories for large-scale quantum chemical computations requires the rapid and accurate extraction of the molecular structure, charge and spin from the crystallographic information file. Here, we develop a general approach to assign the ground state spin of transition metal complexes, in complement to our previous efforts on determining metal oxidation states and bond order within the <em>cell2mol</em> software. Starting from a database of 31k transition metal complexes extracted from the Cambridge Structural Database with <em>cell2mol</em>, we construct the TM-GSspin dataset, which contains 2063 mononuclear first row transition metal complexes and their computed ground state spins. TM-GSspin is highly diverse in terms of metals, metal oxidation states, coordination geometries, and coordination sphere compositions. Based on TM-GSspin, we identify correlations between structural and electronic features of the complexes and their ground state spins to develop a rule-based spin state assignment model. Leveraging this knowledge, we construct interpretable descriptors and build a statistical model achieving 98% cross-validated accuracy in predicting the ground state spin across the board. Our approach provides a practical way to determine the ground state spin of transition metal complexes directly from crystal structures without additional computations, thus enabling the automated use of crystallographic data for large-scale computations involving transition metal complexes.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00093e?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141613716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CatScore: evaluating asymmetric catalyst design at high efficiency CatScore:评估高效不对称催化剂设计
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-07-11 DOI: 10.1039/D4DD00114A
Bing Yan and Kyunghyun Cho

Asymmetric catalysis plays a crucial role in advancing medicine and materials science. However, the prevailing experiment-driven methods for catalyst evaluation are both resource-heavy and time-consuming. To address this challenge, we present CatScore – a learning-centric metric designed for the automatic evaluation of catalyst design models at both instance and system levels. This approach harnesses the power of deep learning to predict product selectivity as a function of reactants and the proposed catalyst. The predicted selectivity serves as a quantitative score, enabling a swift and precise assessment of a catalyst's activity. On an instance level, CatScore's predictions correlate closely with experimental outcomes, demonstrating a Spearman's ρ = 0.84, which surpasses the density functional theory (DFT) based linear free energy relationships (LFERs) metric with ρ = 0.55 and round-trip accuracy metrics at ρ = 0.24. Importantly, when ranking catalyst candidates, CatScore achieves a mean reciprocal ranking significantly superior to traditional LFER methods, marking a considerable reduction in labor and time investments needed to find top-performing catalysts.

不对称催化在推动医学和材料科学发展方面发挥着至关重要的作用。然而,目前用于催化剂评估的实验驱动方法既耗费资源又耗费时间。为了应对这一挑战,我们提出了 CatScore--一种以学习为中心的度量方法,设计用于在实例和系统层面自动评估催化剂设计模型。这种方法利用深度学习的强大功能,将产物选择性作为反应物和拟议催化剂的函数进行预测。预测的选择性可作为量化评分,从而对催化剂的活性进行快速、精确的评估。在实例层面上,CatScore 的预测与实验结果密切相关,显示出 Spearman's ρ = 0.84,超过了密度泛函理论(DFT)的 ρ = 0.54 和往返精度指标 ρ = 0.24。重要的是,在对候选催化剂进行排名时,CatScore 的平均倒数排名明显优于传统的 DFT 方法,大大减少了寻找性能最佳催化剂所需的人力和时间投入。
{"title":"CatScore: evaluating asymmetric catalyst design at high efficiency","authors":"Bing Yan and Kyunghyun Cho","doi":"10.1039/D4DD00114A","DOIUrl":"10.1039/D4DD00114A","url":null,"abstract":"<p >Asymmetric catalysis plays a crucial role in advancing medicine and materials science. However, the prevailing experiment-driven methods for catalyst evaluation are both resource-heavy and time-consuming. To address this challenge, we present CatScore – a learning-centric metric designed for the automatic evaluation of catalyst design models at both instance and system levels. This approach harnesses the power of deep learning to predict product selectivity as a function of reactants and the proposed catalyst. The predicted selectivity serves as a quantitative score, enabling a swift and precise assessment of a catalyst's activity. On an instance level, CatScore's predictions correlate closely with experimental outcomes, demonstrating a Spearman's <em>ρ</em> = 0.84, which surpasses the density functional theory (DFT) based linear free energy relationships (LFERs) metric with <em>ρ</em> = 0.55 and round-trip accuracy metrics at <em>ρ</em> = 0.24. Importantly, when ranking catalyst candidates, CatScore achieves a mean reciprocal ranking significantly superior to traditional LFER methods, marking a considerable reduction in labor and time investments needed to find top-performing catalysts.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00114a?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141613719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards informatics-driven design of nuclear waste forms 实现信息学驱动的新型核废料形式设计
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-07-09 DOI: 10.1039/D4DD00096J
Vinay I. Hegde, Miroslava Peterson, Sarah I. Allec, Xiaonan Lu, Thiruvillamalai Mahadevan, Thanh Nguyen, Jayani Kalahe, Jared Oshiro, Robert J. Seffens, Ethan K. Nickerson, Jincheng Du, Brian J. Riley, John D. Vienna and James E. Saal

Informatics-driven approaches, such as machine learning and sequential experimental design, have shown the potential to drastically impact next-generation materials discovery and design. In this perspective, we present a few guiding principles for applying informatics-based methods towards the design of novel nuclear waste forms. We advocate for adopting a system design approach, and describe the effective usage of data-driven methods in every stage of such a design process. We demonstrate how this approach can optimally leverage physics-based simulations, machine learning surrogates, and experimental synthesis and characterization, within a feedback-driven closed-loop sequential learning framework. We discuss the importance of incorporating domain knowledge into the representation of materials, the construction and curation of datasets, the development of predictive property models, and the design and execution of experiments. We illustrate the application of this approach by successfully designing and validating Na- and Nd-containing phosphate-based ceramic waste forms. Finally, we discuss open challenges in such informatics-driven workflows and present an outlook for their widespread application for the cleanup of nuclear wastes.

信息学驱动的方法,如机器学习和顺序实验设计,已显示出对下一代材料的发现和设计产生巨大影响的潜力。在这一视角中,我们提出了一些将基于信息学的方法应用于新型核废料设计的指导原则。我们主张采用系统设计方法,并介绍了在设计过程的每个阶段有效使用数据驱动方法的情况。我们展示了这种方法如何在一个反馈驱动的闭环顺序学习框架内优化利用基于物理的模拟、机器学习代理以及实验综合和表征。我们讨论了将领域知识纳入材料表征、数据集构建和管理、预测性属性模型开发以及实验设计和执行的重要性。我们通过成功设计和验证含Na和Nd的磷酸盐基陶瓷废物形式来说明这种方法的应用。最后,我们讨论了这种信息学驱动的工作流程所面临的挑战,并对其在核废料清理领域的广泛应用进行了展望。
{"title":"Towards informatics-driven design of nuclear waste forms","authors":"Vinay I. Hegde, Miroslava Peterson, Sarah I. Allec, Xiaonan Lu, Thiruvillamalai Mahadevan, Thanh Nguyen, Jayani Kalahe, Jared Oshiro, Robert J. Seffens, Ethan K. Nickerson, Jincheng Du, Brian J. Riley, John D. Vienna and James E. Saal","doi":"10.1039/D4DD00096J","DOIUrl":"10.1039/D4DD00096J","url":null,"abstract":"<p >Informatics-driven approaches, such as machine learning and sequential experimental design, have shown the potential to drastically impact next-generation materials discovery and design. In this perspective, we present a few guiding principles for applying informatics-based methods towards the design of novel nuclear waste forms. We advocate for adopting a system design approach, and describe the effective usage of data-driven methods in every stage of such a design process. We demonstrate how this approach can optimally leverage physics-based simulations, machine learning surrogates, and experimental synthesis and characterization, within a feedback-driven closed-loop sequential learning framework. We discuss the importance of incorporating domain knowledge into the representation of materials, the construction and curation of datasets, the development of predictive property models, and the design and execution of experiments. We illustrate the application of this approach by successfully designing and validating Na- and Nd-containing phosphate-based ceramic waste forms. Finally, we discuss open challenges in such informatics-driven workflows and present an outlook for their widespread application for the cleanup of nuclear wastes.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00096j?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141572260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine learning of stability scores from kinetic data† 从动力学数据对稳定性评分进行机器学习
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-07-01 DOI: 10.1039/D4DD00036F
Veerupaksh Singla, Qiyuan Zhao and Brett M. Savoie

The absence of computational methods to predict stressor-specific degradation susceptibilities represents a significant and costly challenge to the introduction of new materials into applications. Here, a machine-learning framework is developed that predicts stressor-specific stability scores from computationally generated reaction data. The thermal degradation of alkanes was studied as an exemplary system to demonstrate the approach. The half-lives of ∼32k alkanes were simulated under pyrolysis conditions using 59 model reactions. Using a hinge-loss function, these half-life data were used to train machine learning models to predict a scalar representing the relative stability based only on the molecular graph. These models were successful in transferability case studies using distinct training and testing splits to recapitulate known stability trends with respect to the degree of branching and alkane size. Even the simplest models showed excellent performance in these case studies, demonstrating the relative ease with which thermal stability can be learned. The stability score is also shown to be useful in a design study, where it is used as part of the objective function of a genetic algorithm to guide the search for more stable species. This work provides a framework for converting kinetic reaction data into stability scores that provide actionable design information and opens avenues for exploring more complex chemistries and stressors.

缺乏预测特定应力降解敏感性的计算方法是将新材料引入应用领域所面临的一项重大挑战,而且成本高昂。本文开发了一种机器学习框架,可从计算生成的反应数据中预测特定应激源的稳定性得分。研究了烷烃的热降解作为示范系统,以展示该方法。在热解条件下,使用 59 个模型反应模拟了 ~32k 烷烃的半衰期。利用铰链损失函数,这些半衰期数据被用来训练机器学习模型,以预测一个仅基于分子图的代表相对稳定性的标量。这些模型在可移植性案例研究中取得了成功,使用了不同的训练和测试分区,再现了与支化程度和烷烃大小有关的已知稳定性趋势。在这些案例研究中,即使是最简单的模型也表现出了卓越的性能,这表明热稳定性的学习相对容易。在一项设计研究中,稳定性得分也被证明是有用的,它被用作遗传算法目标函数的一部分,以引导搜索更稳定的物种。这项工作提供了一个将动力学反应数据转化为稳定性分数的框架,从而提供了可操作的设计信息,并为探索更复杂的化学性质和应激源开辟了途径。
{"title":"Machine learning of stability scores from kinetic data†","authors":"Veerupaksh Singla, Qiyuan Zhao and Brett M. Savoie","doi":"10.1039/D4DD00036F","DOIUrl":"10.1039/D4DD00036F","url":null,"abstract":"<p >The absence of computational methods to predict stressor-specific degradation susceptibilities represents a significant and costly challenge to the introduction of new materials into applications. Here, a machine-learning framework is developed that predicts stressor-specific stability scores from computationally generated reaction data. The thermal degradation of alkanes was studied as an exemplary system to demonstrate the approach. The half-lives of ∼32k alkanes were simulated under pyrolysis conditions using 59 model reactions. Using a hinge-loss function, these half-life data were used to train machine learning models to predict a scalar representing the relative stability based only on the molecular graph. These models were successful in transferability case studies using distinct training and testing splits to recapitulate known stability trends with respect to the degree of branching and alkane size. Even the simplest models showed excellent performance in these case studies, demonstrating the relative ease with which thermal stability can be learned. The stability score is also shown to be useful in a design study, where it is used as part of the objective function of a genetic algorithm to guide the search for more stable species. This work provides a framework for converting kinetic reaction data into stability scores that provide actionable design information and opens avenues for exploring more complex chemistries and stressors.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00036f?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141501240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep-learning enabled photonic nanostructure discovery in arbitrarily large shape sets via linked latent space representation learning† 通过关联潜空间表征学习,在任意大的形状集中发现深度学习支持的光子纳米结构
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-07-01 DOI: 10.1039/D4DD00107A
Sudhanshu Singh, Rahul Kumar, Soumyashree S. Panda and Ravi S. Hegde

The vast array of shapes achievable through modern nanofabrication technologies presents a challenge in selecting the most optimal design for achieving a desired optical response. While data-driven techniques, such as deep learning, hold promise for inverse design, their applicability is often limited as they typically explore only smaller subsets of the extensive range of shapes feasible with nanofabrication. Additionally, these models are often regarded as ‘black boxes,’ lacking transparency in revealing the underlying relationship between the shape and optical response. Here, we introduce a methodology tailored to address the challenges posed by large, complex, and diverse sets of nanostructures. Specifically, we demonstrate our approach in the context of periodic silicon metasurfaces operating in the visible wavelength range, considering large and diverse shape set variations. Our paired variational autoencoder method facilitates the creation of rich, continuous, and parameter-aligned latent space representations of the shape–response relationship. We showcase the practical utility of our approach in two key areas: (1) enabling multiple-solution inverse design and (2) conducting sensitivity analyses on a shape's optical response to nanofabrication-induced distortions. This methodology represents a significant advancement in data-driven design techniques, further unlocking the application potential of nanophotonics.

现代纳米制造技术可实现的形状种类繁多,这给选择最佳设计以实现所需的光学响应带来了挑战。虽然深度学习等数据驱动技术有望实现逆向设计,但其适用性往往受到限制,因为它们通常只能探索纳米制造技术所能实现的大量形状中较小的子集。此外,这些模型通常被视为 "黑盒子",在揭示形状与光学响应之间的内在关系方面缺乏透明度。在此,我们介绍了一种专门针对大型、复杂、多样的纳米结构所带来的挑战而量身定制的方法。具体来说,我们在可见光波长范围内工作的周期性硅元表面上演示了我们的方法,并考虑了大量不同的形状集变化。我们的配对变异自动编码器方法有助于创建丰富、连续和参数对齐的形状-响应关系潜在空间表示。我们在两个关键领域展示了我们方法的实用性:1) 实现多方案逆向设计;2)对形状对纳米加工引起的变形的光学响应进行敏感性分析。这种方法代表了数据驱动设计技术的重大进步,进一步释放了纳米光子学的应用潜力。
{"title":"Deep-learning enabled photonic nanostructure discovery in arbitrarily large shape sets via linked latent space representation learning†","authors":"Sudhanshu Singh, Rahul Kumar, Soumyashree S. Panda and Ravi S. Hegde","doi":"10.1039/D4DD00107A","DOIUrl":"10.1039/D4DD00107A","url":null,"abstract":"<p >The vast array of shapes achievable through modern nanofabrication technologies presents a challenge in selecting the most optimal design for achieving a desired optical response. While data-driven techniques, such as deep learning, hold promise for inverse design, their applicability is often limited as they typically explore only smaller subsets of the extensive range of shapes feasible with nanofabrication. Additionally, these models are often regarded as ‘black boxes,’ lacking transparency in revealing the underlying relationship between the shape and optical response. Here, we introduce a methodology tailored to address the challenges posed by large, complex, and diverse sets of nanostructures. Specifically, we demonstrate our approach in the context of periodic silicon metasurfaces operating in the visible wavelength range, considering large and diverse shape set variations. Our paired variational autoencoder method facilitates the creation of rich, continuous, and parameter-aligned latent space representations of the shape–response relationship. We showcase the practical utility of our approach in two key areas: (1) enabling multiple-solution inverse design and (2) conducting sensitivity analyses on a shape's optical response to nanofabrication-induced distortions. This methodology represents a significant advancement in data-driven design techniques, further unlocking the application potential of nanophotonics.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00107a?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141524881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Physics-driven discovery and bandgap engineering of hybrid perovskites† 混合过氧化物的物理驱动发现和带隙工程学
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-06-28 DOI: 10.1039/D4DD00080C
Sheryl L. Sanchez, Elham Foadian, Maxim Ziatdinov, Jonghee Yang, Sergei V. Kalinin, Yongtao Liu and Mahshid Ahmadi

The unique aspect of hybrid perovskites is their tunability, allowing the engineering of the bandgap via substitution. From the application viewpoint, this allows creation of tandem cells between perovskites and silicon, or two or more perovskites, with associated increase of efficiency beyond the single-junction Shockley–Queisser limit. However, the concentration dependence of the optical bandgap in hybrid perovskite solid solutions can be non-linear and even non-monotonic, as determined by band alignments between endmembers, presence of defect states and Urbach tails, and phase separation. Exploring new compositions brings forth the joint problem of the discovery of the composition with the desired band gap and establishing the physical model of the band gap concentration dependence. Here we report the development of the experimental workflow based on structured Gaussian Process (sGP) models and custom sGP (c-sGP) that allow the joint discovery of the experimental behavior and the underpinning physical model. This approach is verified with simulated datasets with known ground truth and was found to accelerate the discovery of experimental behavior and the underlying physical model. The d/c-sGP approach utilizes a few calculated thin film bandgap data points to guide targeted explorations, minimizing the number of thin film preparation steps. Through iterative exploration, we demonstrate that the c-sGP algorithm that combined 5 bandgap models converges rapidly, revealing a relationship in the bandgap diagram of MA1−xGAxPb(I1−xBrx)3. This approach offers a promising method for efficiently understanding the physical model of band gap concentration dependence in binary systems, and this method can also be extended to ternary or higher dimensional systems.

混合型过氧化物的独特之处在于其可调谐性,可以通过置换实现带隙工程。从应用的角度来看,这允许在包晶石和硅或两种或多种包晶石之间创建串联电池,从而提高效率,超越单结肖克利-奎塞尔极限。然而,混合型包光体固溶体的光带隙与浓度的关系可能是非线性的,甚至是非单调的,这是由内部成员之间的带排列、缺陷态和乌尔巴赫尾的存在以及相分离决定的。探索新成分带来的共同问题是:发现具有理想带隙的成分,以及建立带隙浓度依赖性的物理模型。在此,我们报告了基于结构化高斯过程(sGP)模型和定制 sGP(c-sGP)的实验工作流程的开发情况,该流程允许联合发现实验行为和基础物理模型。这种方法通过具有已知地面实况的模拟数据集进行了验证,发现它能加速发现实验行为和基础物理模型。d/c-sGP 方法利用几个计算出的薄膜带隙数据点来指导有针对性的探索,最大限度地减少了薄膜制备步骤的数量。通过迭代探索,我们证明了结合 5 个带隙模型的 c-sGP 算法收敛迅速,揭示了 MA1-xGAxPb(I1-xBrx)3 带隙图中的关系。 这种方法为有效理解二元体系中带隙浓度依赖性的物理模型提供了一种很有前途的方法,这种方法还可以扩展到三元或更高维的体系。
{"title":"Physics-driven discovery and bandgap engineering of hybrid perovskites†","authors":"Sheryl L. Sanchez, Elham Foadian, Maxim Ziatdinov, Jonghee Yang, Sergei V. Kalinin, Yongtao Liu and Mahshid Ahmadi","doi":"10.1039/D4DD00080C","DOIUrl":"10.1039/D4DD00080C","url":null,"abstract":"<p >The unique aspect of hybrid perovskites is their tunability, allowing the engineering of the bandgap <em>via</em> substitution. From the application viewpoint, this allows creation of tandem cells between perovskites and silicon, or two or more perovskites, with associated increase of efficiency beyond the single-junction Shockley–Queisser limit. However, the concentration dependence of the optical bandgap in hybrid perovskite solid solutions can be non-linear and even non-monotonic, as determined by band alignments between endmembers, presence of defect states and Urbach tails, and phase separation. Exploring new compositions brings forth the joint problem of the discovery of the composition with the desired band gap and establishing the physical model of the band gap concentration dependence. Here we report the development of the experimental workflow based on structured Gaussian Process (sGP) models and custom sGP (c-sGP) that allow the joint discovery of the experimental behavior and the underpinning physical model. This approach is verified with simulated datasets with known ground truth and was found to accelerate the discovery of experimental behavior and the underlying physical model. The d/c-sGP approach utilizes a few calculated thin film bandgap data points to guide targeted explorations, minimizing the number of thin film preparation steps. Through iterative exploration, we demonstrate that the c-sGP algorithm that combined 5 bandgap models converges rapidly, revealing a relationship in the bandgap diagram of MA<small><sub>1−<em>x</em></sub></small>GA<small><sub><em>x</em></sub></small>Pb(I<small><sub>1−<em>x</em></sub></small>Br<small><sub><em>x</em></sub></small>)<small><sub>3</sub></small>. This approach offers a promising method for efficiently understanding the physical model of band gap concentration dependence in binary systems, and this method can also be extended to ternary or higher dimensional systems.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00080c?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141501241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Digital discovery
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1