Rolf David, Miguel de la Puente, Axel Gomez, Olaia Anton, Guillaume Stirnemann, Damien Laage
The emergence of artificial intelligence is profoundly impacting computational chemistry, particularly through machine-learning interatomic potentials (MLIPs). Unlike traditional potential energy surface representations, MLIPs overcome the conventional computational scaling limitations by offering an effective combination of accuracy and efficiency for calculating atomic energies and forces to be used in molecular simulations. These MLIPs have significantly enhanced molecular simulations across various applications, including large-scale simulations of materials, interfaces, chemical reactions, and beyond. Despite these advances, the construction of training datasets-a critical component for the accuracy of MLIPs-has not received proportional attention, especially in the context of chemical reactivity, which depends on rare barrier-crossing events that are not easily included in the datasets. Here we address this gap by introducing ArcaNN, a comprehensive framework designed for generating training datasets for reactive MLIPs. ArcaNN employs a concurrent learning approach combined with advanced sampling techniques to ensure an accurate representation of high-energy geometries. The framework integrates automated processes for iterative training, exploration, new configuration selection, and energy and force labeling, all while ensuring reproducibility and documentation. We demonstrate ArcaNN's capabilities through two paradigm reactions: a nucleophilic substitution and a Diels-Alder reaction. These examples showcase its effectiveness, the uniformly low error of the resulting MLIP everywhere along the chemical reaction coordinate, and its potential for broad applications in reactive molecular dynamics. Finally, we provide guidelines for assessing the quality of MLIPs in reactive systems.
{"title":"ArcaNN: automated enhanced sampling generation of training sets for chemically reactive machine learning interatomic potentials.","authors":"Rolf David, Miguel de la Puente, Axel Gomez, Olaia Anton, Guillaume Stirnemann, Damien Laage","doi":"10.1039/d4dd00209a","DOIUrl":"10.1039/d4dd00209a","url":null,"abstract":"<p><p>The emergence of artificial intelligence is profoundly impacting computational chemistry, particularly through machine-learning interatomic potentials (MLIPs). Unlike traditional potential energy surface representations, MLIPs overcome the conventional computational scaling limitations by offering an effective combination of accuracy and efficiency for calculating atomic energies and forces to be used in molecular simulations. These MLIPs have significantly enhanced molecular simulations across various applications, including large-scale simulations of materials, interfaces, chemical reactions, and beyond. Despite these advances, the construction of training datasets-a critical component for the accuracy of MLIPs-has not received proportional attention, especially in the context of chemical reactivity, which depends on rare barrier-crossing events that are not easily included in the datasets. Here we address this gap by introducing ArcaNN, a comprehensive framework designed for generating training datasets for reactive MLIPs. ArcaNN employs a concurrent learning approach combined with advanced sampling techniques to ensure an accurate representation of high-energy geometries. The framework integrates automated processes for iterative training, exploration, new configuration selection, and energy and force labeling, all while ensuring reproducibility and documentation. We demonstrate ArcaNN's capabilities through two paradigm reactions: a nucleophilic substitution and a Diels-Alder reaction. These examples showcase its effectiveness, the uniformly low error of the resulting MLIP everywhere along the chemical reaction coordinate, and its potential for broad applications in reactive molecular dynamics. Finally, we provide guidelines for assessing the quality of MLIPs in reactive systems.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" ","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11563209/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142649564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bradley P. Sutliff, Peter A. Beaucage, Debra J. Audus, Sara V. Orski and Tyler B. Martin
Polyolefins (POs) are the largest class of polymers produced worldwide. Despite the intrinsic chemical similarities within this class of polymers, they are often physically incompatible. This combination presents a significant hurdle for high-throughput recycling systems that strive to sort various types of plastics from one another. Some research has been done to show that near-infrared spectroscopy (NIR) can sort POs from other plastics, but they generally fall short of sorting POs from one another. In this work, we enhance NIR spectroscopy-based sortation by screening over 12 000 machine-learning pipelines to enable sorting of PO species beyond what is possible using current NIR databases. These pipelines include a series of scattering corrections, filtering and differentiation, data scaling, dimensionality reduction, and machine learning classifiers. Common scattering corrections and preprocessing steps include scatter correction, linear detrending, and Savitzky–Golay filtering. Dimensionality reduction techniques such as principal component analysis (PCA), functional principal component analysis (fPCA) and uniform manifold approximation and projection (UMAP) were also investigated for classification enhancements. This analysis of preprocessing steps and classification algorithm combinations identified multiple data pipelines capable of successfully sorting PO materials with over 95% accuracy. Through rigorous testing, this study provides recommendations for consistently applying preprocessing and classification techniques without over-complicating the data analysis. This work also provides a set of preprocessing steps, a chosen classifier, and tuned hyperparameters that may be useful for benchmarking new models and data sets. Finally, the approach outlined here is ready to be applied by the developers of materials sortation equipment so that we can improve the value and purity of recycled plastic waste streams.
聚烯烃 (PO) 是全球生产的最大一类聚合物。尽管这一类聚合物在化学性质上具有内在的相似性,但它们在物理上往往是不相容的。这种组合给高通量回收系统带来了巨大的障碍,因为该系统需要对各种类型的塑料进行分类。已有一些研究表明,近红外光谱(NIR)可以将 PO 从其他塑料中分拣出来,但它们通常无法将 PO 从彼此中分拣出来。在这项工作中,我们通过筛选超过 12000 个机器学习管道,增强了基于近红外光谱的分拣能力,从而实现了超出现有近红外数据库所能实现的 PO 种类分拣。这些管道包括一系列散射校正、过滤和区分、数据缩放、降维以及机器学习分类器。常见的散射校正和预处理步骤包括散射校正、线性去趋势和萨维茨基-戈莱滤波。此外,还研究了主成分分析(PCA)、功能主成分分析(fPCA)和均匀流形逼近与投影(UMAP)等降维技术,以提高分类能力。通过对预处理步骤和分类算法组合的分析,确定了多个数据管道,能够成功地对 PO 材料进行分类,准确率超过 95%。通过严格的测试,本研究为在不使数据分析过于复杂的情况下持续应用预处理和分类技术提供了建议。这项工作还提供了一套预处理步骤、一个选定的分类器和经过调整的超参数,这些可能对新模型和数据集的基准测试有用。最后,材料分类设备的开发人员可以随时应用本文概述的方法,从而提高回收塑料废物流的价值和纯度。
{"title":"Sorting polyolefins with near-infrared spectroscopy: identification of optimal data analysis pipelines and machine learning classifiers†‡","authors":"Bradley P. Sutliff, Peter A. Beaucage, Debra J. Audus, Sara V. Orski and Tyler B. Martin","doi":"10.1039/D4DD00235K","DOIUrl":"https://doi.org/10.1039/D4DD00235K","url":null,"abstract":"<p >Polyolefins (POs) are the largest class of polymers produced worldwide. Despite the intrinsic chemical similarities within this class of polymers, they are often physically incompatible. This combination presents a significant hurdle for high-throughput recycling systems that strive to sort various types of plastics from one another. Some research has been done to show that near-infrared spectroscopy (NIR) can sort POs from other plastics, but they generally fall short of sorting POs from one another. In this work, we enhance NIR spectroscopy-based sortation by screening over 12 000 machine-learning pipelines to enable sorting of PO species beyond what is possible using current NIR databases. These pipelines include a series of scattering corrections, filtering and differentiation, data scaling, dimensionality reduction, and machine learning classifiers. Common scattering corrections and preprocessing steps include scatter correction, linear detrending, and Savitzky–Golay filtering. Dimensionality reduction techniques such as principal component analysis (PCA), functional principal component analysis (fPCA) and uniform manifold approximation and projection (UMAP) were also investigated for classification enhancements. This analysis of preprocessing steps and classification algorithm combinations identified multiple data pipelines capable of successfully sorting PO materials with over 95% accuracy. Through rigorous testing, this study provides recommendations for consistently applying preprocessing and classification techniques without over-complicating the data analysis. This work also provides a set of preprocessing steps, a chosen classifier, and tuned hyperparameters that may be useful for benchmarking new models and data sets. Finally, the approach outlined here is ready to be applied by the developers of materials sortation equipment so that we can improve the value and purity of recycled plastic waste streams.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 2341-2355"},"PeriodicalIF":6.2,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00235k?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tim Rensmeyer, Ben Craig, Denis Kramer and Oliver Niggemann
Ab initio molecular dynamics simulations of material properties have become a cornerstone in the development of novel materials for a wide range of applications such as battery technology and catalysis. Unfortunately, their high computational demand can make them unsuitable in many applications. Consequently, surrogate modeling via neural networks has become an active field of research. Two of the major obstacles to their practical application in many cases are assessing the reliability of the neural network predictions and the difficulty of generating suitable datasets to train the neural network in the first place. Bayesian neural networks offer a promising framework for modeling uncertainty, active learning and improving data efficiency and robustness by incorporating prior physical knowledge. However, due to the high computational demand and slow convergence of the gold standard approach of Monte Carlo Markov Chain (MCMC) sampling methods, variational inference via Monte Carlo dropout is currently the only sampling method successfully applied in this domain. Since MCMC methods have often displayed a superior quality in their uncertainty quantification, developing a suitable MCMC method in this domain would be a significant advance in making neural network-based molecular dynamics simulations more practically viable. In this paper, we demonstrate that convergence for state-of-the-art models with high-quality MCMC methods can still be achieved in a practical amount of time by introducing a novel parameter-specific adaptive step size scheme. In addition, we introduce a new stochastic neural network model based on the NequIP architecture and demonstrate that, when combined with our novel sampling algorithm, we obtain predictions with state-of-the-art accuracy as well as a significantly improved measure of uncertainty over Monte Carlo dropout. Lastly, we show that the proposed algorithm can even outperform deep ensembles while sampling from a single Markov chain.
材料特性的 Ab initio 分子动力学模拟已成为开发新型材料的基石,广泛应用于电池技术和催化等领域。遗憾的是,由于计算量大,在许多应用中并不适用。因此,通过神经网络进行代用建模已成为一个活跃的研究领域。在许多情况下,神经网络实际应用的两个主要障碍是评估神经网络预测的可靠性,以及难以首先生成合适的数据集来训练神经网络。贝叶斯神经网络为不确定性建模、主动学习以及通过结合先验物理知识提高数据效率和鲁棒性提供了一个前景广阔的框架。然而,由于蒙特卡洛马尔可夫链(MCMC)采样方法这一黄金标准方法的计算需求高、收敛速度慢,通过蒙特卡洛剔除进行变分推理是目前成功应用于这一领域的唯一采样方法。由于 MCMC 方法通常在不确定性量化方面表现出更高的质量,因此在这一领域开发一种合适的 MCMC 方法将是一项重大进步,可使基于神经网络的分子动力学模拟更加切实可行。在本文中,我们通过引入新颖的特定参数自适应步长方案,证明了高质量 MCMC 方法仍可在实际时间内实现最先进模型的收敛。此外,我们还引入了基于 NequIP 架构的新型随机神经网络模型,并证明结合我们的新型采样算法,我们不仅能获得最先进的预测精度,还能显著改善蒙特卡罗遗漏的不确定性度量。最后,我们证明了所提出的算法甚至可以在从单个马尔可夫链采样时超越深度集合。
{"title":"High accuracy uncertainty-aware interatomic force modeling with equivariant Bayesian neural networks†","authors":"Tim Rensmeyer, Ben Craig, Denis Kramer and Oliver Niggemann","doi":"10.1039/D4DD00183D","DOIUrl":"https://doi.org/10.1039/D4DD00183D","url":null,"abstract":"<p > <em>Ab initio</em> molecular dynamics simulations of material properties have become a cornerstone in the development of novel materials for a wide range of applications such as battery technology and catalysis. Unfortunately, their high computational demand can make them unsuitable in many applications. Consequently, surrogate modeling <em>via</em> neural networks has become an active field of research. Two of the major obstacles to their practical application in many cases are assessing the reliability of the neural network predictions and the difficulty of generating suitable datasets to train the neural network in the first place. Bayesian neural networks offer a promising framework for modeling uncertainty, active learning and improving data efficiency and robustness by incorporating prior physical knowledge. However, due to the high computational demand and slow convergence of the gold standard approach of Monte Carlo Markov Chain (MCMC) sampling methods, variational inference <em>via</em> Monte Carlo dropout is currently the only sampling method successfully applied in this domain. Since MCMC methods have often displayed a superior quality in their uncertainty quantification, developing a suitable MCMC method in this domain would be a significant advance in making neural network-based molecular dynamics simulations more practically viable. In this paper, we demonstrate that convergence for state-of-the-art models with high-quality MCMC methods can still be achieved in a practical amount of time by introducing a novel parameter-specific adaptive step size scheme. In addition, we introduce a new stochastic neural network model based on the NequIP architecture and demonstrate that, when combined with our novel sampling algorithm, we obtain predictions with state-of-the-art accuracy as well as a significantly improved measure of uncertainty over Monte Carlo dropout. Lastly, we show that the proposed algorithm can even outperform deep ensembles while sampling from a single Markov chain.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 2356-2366"},"PeriodicalIF":6.2,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00183d?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Shayan Mousavi Masouleh, Corey A. Sanz, Ryan P. Jansonius, Samuel Shi, Maria J. Gendron Romero, Jason E. Hein and Jason Hattrick-Simpers
By 2035, the need for battery-grade lithium is expected to quadruple. About half of this lithium is currently sourced from brines and must be converted from lithium chloride into lithium carbonate (Li2CO3) through a process called softening. Conventional softening methods using sodium or potassium salts contribute to carbon emissions during reagent mining and battery manufacturing, exacerbating global warming. This study introduces an alternative approach using carbon dioxide (CO2(g)) as the carbonating reagent in the lithium softening process, offering a carbon capture solution. We employed an active learning-driven high-throughput method to rapidly capture CO2(g) and convert it to lithium carbonate. The model was simplified by focusing on the elemental concentrations of C, Li, and N for practical measurement and tracking, avoiding the complexities of ion speciation equilibria. This approach led to an optimized lithium carbonate process that capitalizes on CO2(g) capture and improves the battery metal supply chain's carbon efficiency.
到 2035 年,对电池级锂的需求预计将翻两番。目前,约有一半的锂来自卤水,必须通过软化过程将氯化锂转化为碳酸锂(Li2CO3)。使用钠盐或钾盐的传统软化方法会在试剂开采和电池制造过程中造成碳排放,加剧全球变暖。本研究介绍了一种替代方法,即在锂软化过程中使用二氧化碳(CO2(g))作为碳化试剂,提供碳捕获解决方案。我们采用了一种主动学习驱动的高通量方法来快速捕获 CO2(g)并将其转化为碳酸锂。为了便于实际测量和跟踪,我们对模型进行了简化,将重点放在 C、Li 和 N 的元素浓度上,避免了复杂的离子标示平衡。这种方法优化了碳酸锂工艺,充分利用了二氧化碳(g)的捕获,提高了电池金属供应链的碳效率。
{"title":"Artificial intelligence-enabled optimization of battery-grade lithium carbonate production†","authors":"S. Shayan Mousavi Masouleh, Corey A. Sanz, Ryan P. Jansonius, Samuel Shi, Maria J. Gendron Romero, Jason E. Hein and Jason Hattrick-Simpers","doi":"10.1039/D4DD00159A","DOIUrl":"https://doi.org/10.1039/D4DD00159A","url":null,"abstract":"<p >By 2035, the need for battery-grade lithium is expected to quadruple. About half of this lithium is currently sourced from brines and must be converted from lithium chloride into lithium carbonate (Li<small><sub>2</sub></small>CO<small><sub>3</sub></small>) through a process called softening. Conventional softening methods using sodium or potassium salts contribute to carbon emissions during reagent mining and battery manufacturing, exacerbating global warming. This study introduces an alternative approach using carbon dioxide (CO<small><sub>2(g)</sub></small>) as the carbonating reagent in the lithium softening process, offering a carbon capture solution. We employed an active learning-driven high-throughput method to rapidly capture CO<small><sub>2(g)</sub></small> and convert it to lithium carbonate. The model was simplified by focusing on the elemental concentrations of C, Li, and N for practical measurement and tracking, avoiding the complexities of ion speciation equilibria. This approach led to an optimized lithium carbonate process that capitalizes on CO<small><sub>2(g)</sub></small> capture and improves the battery metal supply chain's carbon efficiency.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 2320-2326"},"PeriodicalIF":6.2,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00159a?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Benedikt Winter, Clemens Winter, Johannes Schilling and André Bardow
Correction for ‘A smile is all you need: predicting limiting activity coefficients from SMILES with natural language processing’ by Benedikt Winter et al., Digital Discovery, 2022, 1, 859–869, https://doi.org/10.1039/D2DD00058J.
[此处更正了文章 DOI:10.1039/D2DD00058J]。
{"title":"Correction: A smile is all you need: predicting limiting activity coefficients from SMILES with natural language processing","authors":"Benedikt Winter, Clemens Winter, Johannes Schilling and André Bardow","doi":"10.1039/D4DD90045F","DOIUrl":"10.1039/D4DD90045F","url":null,"abstract":"<p >Correction for ‘A smile is all you need: predicting limiting activity coefficients from SMILES with natural language processing’ by Benedikt Winter <em>et al.</em>, <em>Digital Discovery</em>, 2022, <strong>1</strong>, 859–869, https://doi.org/10.1039/D2DD00058J.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 2384-2384"},"PeriodicalIF":6.2,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11472119/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Heqian Zhang, Jiaquan Huang, Xiaoyu Wang, Zhizeng Gao, Song Meng, Hang Li, Shanshan Zhou, Shang Wang, Shan Wang, Xunyou Yan, Xinwei Yang, Xiaoluo Huang and Zhiwei Qin
Microorganisms are valuable resources as antibiotic producers, biocontrol agents, and symbiotic agents in various ecosystems and organisms. Over the past decades, there has been a notable increase in the identification and generation of both wild-type and genetically modified microbial strains from research laboratories worldwide. However, a substantial portion of the information represented in these strains remains scattered across the scientific literature. To facilitate the work of future researchers, in this perspective article, we advocate the adoption of the DNA-based natural language (DBNL) algorithm standard and then demonstrate it using a Streptomyces species as a proof of concept. This standard enables the sophisticated genome sequencing and subsequent extraction of valuable information encoded within a particular microbial species. In addition, it allows the access of such information for the continued research and applications even if a currently cultivated microbe cannot be cultured in the future. Embracing the DBNL algorithm standard promises to enhance the efficiency and effectiveness of microbial research, paving the way for innovative solutions and discoveries in diverse fields.
微生物是各种生态系统和生物体中的抗生素生产者、生物控制剂和共生剂,是宝贵的资源。过去几十年来,世界各地研究实验室鉴定和产生的野生型和转基因微生物菌株显著增加。然而,这些菌株所代表的大量信息仍然散见于科学文献中。为了促进未来研究人员的工作,我们在这篇视角文章中提倡采用基于 DNA 的自然语言(DBNL)算法标准,然后用一个链霉菌物种作为概念验证进行了演示。该标准可以进行复杂的基因组测序,随后提取特定微生物物种中编码的宝贵信息。此外,即使目前培养的微生物将来无法再培养,也能获取这些信息,用于继续研究和应用。采用 DBNL 算法标准有望提高微生物研究的效率和效果,为不同领域的创新解决方案和发现铺平道路。
{"title":"Embedding DNA-based natural language in microbes for the benefit of future researchers†","authors":"Heqian Zhang, Jiaquan Huang, Xiaoyu Wang, Zhizeng Gao, Song Meng, Hang Li, Shanshan Zhou, Shang Wang, Shan Wang, Xunyou Yan, Xinwei Yang, Xiaoluo Huang and Zhiwei Qin","doi":"10.1039/D4DD00251B","DOIUrl":"https://doi.org/10.1039/D4DD00251B","url":null,"abstract":"<p >Microorganisms are valuable resources as antibiotic producers, biocontrol agents, and symbiotic agents in various ecosystems and organisms. Over the past decades, there has been a notable increase in the identification and generation of both wild-type and genetically modified microbial strains from research laboratories worldwide. However, a substantial portion of the information represented in these strains remains scattered across the scientific literature. To facilitate the work of future researchers, in this perspective article, we advocate the adoption of the DNA-based natural language (DBNL) algorithm standard and then demonstrate it using a <em>Streptomyces</em> species as a proof of concept. This standard enables the sophisticated genome sequencing and subsequent extraction of valuable information encoded within a particular microbial species. In addition, it allows the access of such information for the continued research and applications even if a currently cultivated microbe cannot be cultured in the future. Embracing the DBNL algorithm standard promises to enhance the efficiency and effectiveness of microbial research, paving the way for innovative solutions and discoveries in diverse fields.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 2377-2383"},"PeriodicalIF":6.2,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00251b?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuyan Yang, Yifei Lin, Shengnan Dai, Yifan Zhu, Jinyang Xi, Lili Xi, Xiaokun Gu, David J. Singh, Wenqing Zhang and Jiong Yang
High-throughput screening of thermoelectric materials from databases requires efficient and accurate computational methods. Machine-learning interatomic potentials (MLIPs) provide a promising avenue, facilitating the development of database-driven thermal transport applications through high-throughput simulations. However, the present challenge is the lack of standardized databases and openly available models for precise large-scale simulations. Here, we introduce HH130, a standardized database for 130 half-Heusler (HH) compounds in MatHub-3d (http://www.mathub3d.net), containing both MLIP models and datasets for the thermal transport of HH thermoelectrics. HH130 contains 31 891 total configurations (∼245 configurations per HH) and 390 MLIP models (three models per HH), generated using the dual adaptive sampling method to cover a wide range of thermodynamic conditions, and can be openly accessed on MatHub-3d. Comprehensive validation against first-principles calculations demonstrates that the MLIP models accurately predict energies, forces, and interatomic force constants (IFCs). The MLIP models in HH130 enabled us to efficiently perform four-phonon interactions for 80 HHs with phonon frequencies closely matching ab initio results. It is found that HHs with an 8 valence electron count (VEC) per unit cell generally exhibit lower lattice thermal conductivities (κLs) compared to those with an 18 VEC, due to a combination of low 2nd-order IFCs and large scattering phase spaces in the former group. Additionally, we identified several HHs that demonstrate significant reductions in κL due to four-phonon interactions. HH130 provides a robust platform for high-throughput computation of κL and aids in the discovery of next-generation thermoelectrics through machine learning.
{"title":"HH130: a standardized database of machine learning interatomic potentials, datasets, and its applications in the thermal transport of half-Heusler thermoelectrics†","authors":"Yuyan Yang, Yifei Lin, Shengnan Dai, Yifan Zhu, Jinyang Xi, Lili Xi, Xiaokun Gu, David J. Singh, Wenqing Zhang and Jiong Yang","doi":"10.1039/D4DD00240G","DOIUrl":"https://doi.org/10.1039/D4DD00240G","url":null,"abstract":"<p >High-throughput screening of thermoelectric materials from databases requires efficient and accurate computational methods. Machine-learning interatomic potentials (MLIPs) provide a promising avenue, facilitating the development of database-driven thermal transport applications through high-throughput simulations. However, the present challenge is the lack of standardized databases and openly available models for precise large-scale simulations. Here, we introduce HH130, a standardized database for 130 half-Heusler (HH) compounds in MatHub-3d (http://www.mathub3d.net), containing both MLIP models and datasets for the thermal transport of HH thermoelectrics. HH130 contains 31 891 total configurations (∼245 configurations per HH) and 390 MLIP models (three models per HH), generated using the dual adaptive sampling method to cover a wide range of thermodynamic conditions, and can be openly accessed on MatHub-3d. Comprehensive validation against first-principles calculations demonstrates that the MLIP models accurately predict energies, forces, and interatomic force constants (IFCs). The MLIP models in HH130 enabled us to efficiently perform four-phonon interactions for 80 HHs with phonon frequencies closely matching <em>ab initio</em> results. It is found that HHs with an 8 valence electron count (VEC) per unit cell generally exhibit lower lattice thermal conductivities (<em>κ</em><small><sub>L</sub></small>s) compared to those with an 18 VEC, due to a combination of low 2nd-order IFCs and large scattering phase spaces in the former group. Additionally, we identified several HHs that demonstrate significant reductions in <em>κ</em><small><sub>L</sub></small> due to four-phonon interactions. HH130 provides a robust platform for high-throughput computation of <em>κ</em><small><sub>L</sub></small> and aids in the discovery of next-generation thermoelectrics through machine learning.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 2201-2210"},"PeriodicalIF":6.2,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00240g?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142594944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dingqi Nai, Gabriel S. Gusmão, Zachary A. Kilwein, Fani Boukouvala and Andrew J. Medford
The temporal analysis of products (TAP) technique produces extensive transient kinetic data sets, but it is challenging to translate the large quantity of raw data into physically interpretable kinetic models, largely due to the computational scaling of existing numerical methods for fitting TAP data. In this work, we utilize kinetics-informed neural networks (KINNs), which are artificial feedforward neural networks designed to solve ordinary differential equations constrained by micro-kinetic models, to model the TAP data. We demonstrate that, under the assumption that all concentrations are known in the thin catalyst zone, KINNs can simultaneously fit the transient data, retrieve the kinetic model parameters, and interpolate unseen pulse behavior for multi-pulse experiments. We further demonstrate that, by modifying the loss function, KINNs maintain these capabilities even when precise thin-zone information is unavailable, as would be the case with real experimental TAP data. We also compare the approach to existing optimization techniques, which reveals improved noise tolerance and performance in extracting kinetic parameters. The KINNs approach offers an efficient alternative for TAP analysis and can assist in interpreting transient kinetics in complex systems over long timescales.
产品的时间分析(TAP)技术会产生大量的瞬态动力学数据集,但要将大量原始数据转化为物理上可解释的动力学模型却很困难,这主要是由于现有拟合 TAP 数据的数值方法的计算规模所致。在这项工作中,我们利用动力学信息神经网络(KINNs)来建立 TAP 数据模型,KINNs 是一种人工前馈神经网络,旨在求解受微观动力学模型约束的常微分方程。我们证明,在已知薄催化剂区所有浓度的假设下,KINNs 可以同时拟合瞬态数据、检索动力学模型参数,并对多脉冲实验中未见的脉冲行为进行插值。我们进一步证明,通过修改损失函数,KINNs 即使在无法获得精确的薄区信息(如真实的 TAP 实验数据)的情况下也能保持这些能力。我们还将该方法与现有的优化技术进行了比较,结果表明,该方法在提取动力学参数方面具有更好的噪音容忍度和性能。KINNs 方法为 TAP 分析提供了一种有效的替代方法,有助于解释复杂系统的长时间尺度瞬态动力学。
{"title":"Micro-kinetic modeling of temporal analysis of products data using kinetics-informed neural networks†","authors":"Dingqi Nai, Gabriel S. Gusmão, Zachary A. Kilwein, Fani Boukouvala and Andrew J. Medford","doi":"10.1039/D4DD00163J","DOIUrl":"https://doi.org/10.1039/D4DD00163J","url":null,"abstract":"<p >The temporal analysis of products (TAP) technique produces extensive transient kinetic data sets, but it is challenging to translate the large quantity of raw data into physically interpretable kinetic models, largely due to the computational scaling of existing numerical methods for fitting TAP data. In this work, we utilize kinetics-informed neural networks (KINNs), which are artificial feedforward neural networks designed to solve ordinary differential equations constrained by micro-kinetic models, to model the TAP data. We demonstrate that, under the assumption that all concentrations are known in the thin catalyst zone, KINNs can simultaneously fit the transient data, retrieve the kinetic model parameters, and interpolate unseen pulse behavior for multi-pulse experiments. We further demonstrate that, by modifying the loss function, KINNs maintain these capabilities even when precise thin-zone information is unavailable, as would be the case with real experimental TAP data. We also compare the approach to existing optimization techniques, which reveals improved noise tolerance and performance in extracting kinetic parameters. The KINNs approach offers an efficient alternative for TAP analysis and can assist in interpreting transient kinetics in complex systems over long timescales.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 2327-2340"},"PeriodicalIF":6.2,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00163j?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Before a new molecular structure is registered to a chemical structure database, a duplicate check is essential to ensure the integrity of the database. The Simplified Molecular Input Line Entry Specification (SMILES) and the IUPAC International Chemical Identifier (InChI) stand out as widely used molecular identifiers for these checks. Notable limitations arise when dealing with molecules from inorganic chemistry or structures characterized by non-central stereochemistry. When the stereoinformation needs to be assigned to a group of atoms, widely used identifiers cannot describe axial and planar chirality due to the atom-centered description of a molecule. To address this limitation, we introduce a novel chemical identifier called the Molecular Barcode (MolBar). Motivated by the field of theoretical chemistry, a fragment-based approach is used in addition to the conventional atomistic description. In this approach, the 3D structure of fragments is normalized using a specialized force field and characterized by physically inspired matrices derived solely from atomic positions. The resulting permutation-invariant representation is constructed from the eigenvalue spectra, providing comprehensive information on both bonding and stereochemistry. The robustness of MolBar is demonstrated through duplication and permutation invariance tests on the Molecule3D dataset of 3.9 million molecules. A Python implementation is available as open source and can be installed via pip install molbar.
{"title":"MolBar: a molecular identifier for inorganic and organic molecules with full support of stereoisomerism†","authors":"Nils van Staalduinen and Christoph Bannwarth","doi":"10.1039/D4DD00208C","DOIUrl":"https://doi.org/10.1039/D4DD00208C","url":null,"abstract":"<p >Before a new molecular structure is registered to a chemical structure database, a duplicate check is essential to ensure the integrity of the database. The Simplified Molecular Input Line Entry Specification (SMILES) and the IUPAC International Chemical Identifier (InChI) stand out as widely used molecular identifiers for these checks. Notable limitations arise when dealing with molecules from inorganic chemistry or structures characterized by non-central stereochemistry. When the stereoinformation needs to be assigned to a group of atoms, widely used identifiers cannot describe axial and planar chirality due to the atom-centered description of a molecule. To address this limitation, we introduce a novel chemical identifier called the Molecular Barcode (MolBar). Motivated by the field of theoretical chemistry, a fragment-based approach is used in addition to the conventional atomistic description. In this approach, the 3D structure of fragments is normalized using a specialized force field and characterized by physically inspired matrices derived solely from atomic positions. The resulting permutation-invariant representation is constructed from the eigenvalue spectra, providing comprehensive information on both bonding and stereochemistry. The robustness of MolBar is demonstrated through duplication and permutation invariance tests on the Molecule3D dataset of 3.9 million molecules. A Python implementation is available as open source and can be installed <em>via pip install molbar</em>.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 2298-2319"},"PeriodicalIF":6.2,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00208c?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Keiichi Okubo, Jaydeep Thik, Tomoya Yamaguchi and Chen Ling
The rotating disk electrode (RDE) technique is an essential tool for studying the activity, stability, and other fundamental properties of electrocatalysts. High-quality RDE experimentation requires evenly coating the catalyst layer on the electrode surface, which relies heavily on experience and currently lacks necessary quality control. The lack of an adequate evaluation method to ensure the quality of RDE experimentation, aside from conventional judgment based on expertise, reduces efficiency, complicates data interpretation, and hinders future automation of RDE experimentation. Here we propose a simple, easy-to-execute and non-destructive method that combines microscopy imaging and artificial intelligence-based decision-making to assess the quality of as-prepared electrodes. We develop a convolutional neural network-based method that uses microscopic images of as-prepared electrodes to directly evaluate the sample quality. In a study of electrodes used for the oxygen reduction reaction, the model achieved an accuracy of over 80% in predicting sample qualities. Our method enables the removal of low-quality samples prior to the actual RDE test, thereby ensuring high-quality electrochemical experimentation and paving the way towards high-quality automated electrochemical experimentation. This approach is applicable to various electrochemical systems and highlights the potential of artificial intelligence in automated experimentation.
{"title":"Computer vision enabled high-quality electrochemical experimentation","authors":"Keiichi Okubo, Jaydeep Thik, Tomoya Yamaguchi and Chen Ling","doi":"10.1039/D4DD00213J","DOIUrl":"https://doi.org/10.1039/D4DD00213J","url":null,"abstract":"<p >The rotating disk electrode (RDE) technique is an essential tool for studying the activity, stability, and other fundamental properties of electrocatalysts. High-quality RDE experimentation requires evenly coating the catalyst layer on the electrode surface, which relies heavily on experience and currently lacks necessary quality control. The lack of an adequate evaluation method to ensure the quality of RDE experimentation, aside from conventional judgment based on expertise, reduces efficiency, complicates data interpretation, and hinders future automation of RDE experimentation. Here we propose a simple, easy-to-execute and non-destructive method that combines microscopy imaging and artificial intelligence-based decision-making to assess the quality of as-prepared electrodes. We develop a convolutional neural network-based method that uses microscopic images of as-prepared electrodes to directly evaluate the sample quality. In a study of electrodes used for the oxygen reduction reaction, the model achieved an accuracy of over 80% in predicting sample qualities. Our method enables the removal of low-quality samples prior to the actual RDE test, thereby ensuring high-quality electrochemical experimentation and paving the way towards high-quality automated electrochemical experimentation. This approach is applicable to various electrochemical systems and highlights the potential of artificial intelligence in automated experimentation.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 2183-2191"},"PeriodicalIF":6.2,"publicationDate":"2024-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00213j?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142594942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}