Vinay I. Hegde, Miroslava Peterson, Sarah I. Allec, Xiaonan Lu, Thiruvillamalai Mahadevan, Thanh Nguyen, Jayani Kalahe, Jared Oshiro, Robert J. Seffens, Ethan K. Nickerson, Jincheng Du, Brian J. Riley, John D. Vienna and James E. Saal
Informatics-driven approaches, such as machine learning and sequential experimental design, have shown the potential to drastically impact next-generation materials discovery and design. In this perspective, we present a few guiding principles for applying informatics-based methods towards the design of novel nuclear waste forms. We advocate for adopting a system design approach, and describe the effective usage of data-driven methods in every stage of such a design process. We demonstrate how this approach can optimally leverage physics-based simulations, machine learning surrogates, and experimental synthesis and characterization, within a feedback-driven closed-loop sequential learning framework. We discuss the importance of incorporating domain knowledge into the representation of materials, the construction and curation of datasets, the development of predictive property models, and the design and execution of experiments. We illustrate the application of this approach by successfully designing and validating Na- and Nd-containing phosphate-based ceramic waste forms. Finally, we discuss open challenges in such informatics-driven workflows and present an outlook for their widespread application for the cleanup of nuclear wastes.
{"title":"Towards informatics-driven design of nuclear waste forms","authors":"Vinay I. Hegde, Miroslava Peterson, Sarah I. Allec, Xiaonan Lu, Thiruvillamalai Mahadevan, Thanh Nguyen, Jayani Kalahe, Jared Oshiro, Robert J. Seffens, Ethan K. Nickerson, Jincheng Du, Brian J. Riley, John D. Vienna and James E. Saal","doi":"10.1039/D4DD00096J","DOIUrl":"10.1039/D4DD00096J","url":null,"abstract":"<p >Informatics-driven approaches, such as machine learning and sequential experimental design, have shown the potential to drastically impact next-generation materials discovery and design. In this perspective, we present a few guiding principles for applying informatics-based methods towards the design of novel nuclear waste forms. We advocate for adopting a system design approach, and describe the effective usage of data-driven methods in every stage of such a design process. We demonstrate how this approach can optimally leverage physics-based simulations, machine learning surrogates, and experimental synthesis and characterization, within a feedback-driven closed-loop sequential learning framework. We discuss the importance of incorporating domain knowledge into the representation of materials, the construction and curation of datasets, the development of predictive property models, and the design and execution of experiments. We illustrate the application of this approach by successfully designing and validating Na- and Nd-containing phosphate-based ceramic waste forms. Finally, we discuss open challenges in such informatics-driven workflows and present an outlook for their widespread application for the cleanup of nuclear wastes.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00096j?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141572260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Veerupaksh Singla, Qiyuan Zhao and Brett M. Savoie
The absence of computational methods to predict stressor-specific degradation susceptibilities represents a significant and costly challenge to the introduction of new materials into applications. Here, a machine-learning framework is developed that predicts stressor-specific stability scores from computationally generated reaction data. The thermal degradation of alkanes was studied as an exemplary system to demonstrate the approach. The half-lives of ∼32k alkanes were simulated under pyrolysis conditions using 59 model reactions. Using a hinge-loss function, these half-life data were used to train machine learning models to predict a scalar representing the relative stability based only on the molecular graph. These models were successful in transferability case studies using distinct training and testing splits to recapitulate known stability trends with respect to the degree of branching and alkane size. Even the simplest models showed excellent performance in these case studies, demonstrating the relative ease with which thermal stability can be learned. The stability score is also shown to be useful in a design study, where it is used as part of the objective function of a genetic algorithm to guide the search for more stable species. This work provides a framework for converting kinetic reaction data into stability scores that provide actionable design information and opens avenues for exploring more complex chemistries and stressors.
{"title":"Machine learning of stability scores from kinetic data†","authors":"Veerupaksh Singla, Qiyuan Zhao and Brett M. Savoie","doi":"10.1039/D4DD00036F","DOIUrl":"10.1039/D4DD00036F","url":null,"abstract":"<p >The absence of computational methods to predict stressor-specific degradation susceptibilities represents a significant and costly challenge to the introduction of new materials into applications. Here, a machine-learning framework is developed that predicts stressor-specific stability scores from computationally generated reaction data. The thermal degradation of alkanes was studied as an exemplary system to demonstrate the approach. The half-lives of ∼32k alkanes were simulated under pyrolysis conditions using 59 model reactions. Using a hinge-loss function, these half-life data were used to train machine learning models to predict a scalar representing the relative stability based only on the molecular graph. These models were successful in transferability case studies using distinct training and testing splits to recapitulate known stability trends with respect to the degree of branching and alkane size. Even the simplest models showed excellent performance in these case studies, demonstrating the relative ease with which thermal stability can be learned. The stability score is also shown to be useful in a design study, where it is used as part of the objective function of a genetic algorithm to guide the search for more stable species. This work provides a framework for converting kinetic reaction data into stability scores that provide actionable design information and opens avenues for exploring more complex chemistries and stressors.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00036f?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141501240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sudhanshu Singh, Rahul Kumar, Soumyashree S. Panda and Ravi S. Hegde
The vast array of shapes achievable through modern nanofabrication technologies presents a challenge in selecting the most optimal design for achieving a desired optical response. While data-driven techniques, such as deep learning, hold promise for inverse design, their applicability is often limited as they typically explore only smaller subsets of the extensive range of shapes feasible with nanofabrication. Additionally, these models are often regarded as ‘black boxes,’ lacking transparency in revealing the underlying relationship between the shape and optical response. Here, we introduce a methodology tailored to address the challenges posed by large, complex, and diverse sets of nanostructures. Specifically, we demonstrate our approach in the context of periodic silicon metasurfaces operating in the visible wavelength range, considering large and diverse shape set variations. Our paired variational autoencoder method facilitates the creation of rich, continuous, and parameter-aligned latent space representations of the shape–response relationship. We showcase the practical utility of our approach in two key areas: (1) enabling multiple-solution inverse design and (2) conducting sensitivity analyses on a shape's optical response to nanofabrication-induced distortions. This methodology represents a significant advancement in data-driven design techniques, further unlocking the application potential of nanophotonics.
{"title":"Deep-learning enabled photonic nanostructure discovery in arbitrarily large shape sets via linked latent space representation learning†","authors":"Sudhanshu Singh, Rahul Kumar, Soumyashree S. Panda and Ravi S. Hegde","doi":"10.1039/D4DD00107A","DOIUrl":"10.1039/D4DD00107A","url":null,"abstract":"<p >The vast array of shapes achievable through modern nanofabrication technologies presents a challenge in selecting the most optimal design for achieving a desired optical response. While data-driven techniques, such as deep learning, hold promise for inverse design, their applicability is often limited as they typically explore only smaller subsets of the extensive range of shapes feasible with nanofabrication. Additionally, these models are often regarded as ‘black boxes,’ lacking transparency in revealing the underlying relationship between the shape and optical response. Here, we introduce a methodology tailored to address the challenges posed by large, complex, and diverse sets of nanostructures. Specifically, we demonstrate our approach in the context of periodic silicon metasurfaces operating in the visible wavelength range, considering large and diverse shape set variations. Our paired variational autoencoder method facilitates the creation of rich, continuous, and parameter-aligned latent space representations of the shape–response relationship. We showcase the practical utility of our approach in two key areas: (1) enabling multiple-solution inverse design and (2) conducting sensitivity analyses on a shape's optical response to nanofabrication-induced distortions. This methodology represents a significant advancement in data-driven design techniques, further unlocking the application potential of nanophotonics.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00107a?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141524881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sheryl L. Sanchez, Elham Foadian, Maxim Ziatdinov, Jonghee Yang, Sergei V. Kalinin, Yongtao Liu and Mahshid Ahmadi
The unique aspect of hybrid perovskites is their tunability, allowing the engineering of the bandgap via substitution. From the application viewpoint, this allows creation of tandem cells between perovskites and silicon, or two or more perovskites, with associated increase of efficiency beyond the single-junction Shockley–Queisser limit. However, the concentration dependence of the optical bandgap in hybrid perovskite solid solutions can be non-linear and even non-monotonic, as determined by band alignments between endmembers, presence of defect states and Urbach tails, and phase separation. Exploring new compositions brings forth the joint problem of the discovery of the composition with the desired band gap and establishing the physical model of the band gap concentration dependence. Here we report the development of the experimental workflow based on structured Gaussian Process (sGP) models and custom sGP (c-sGP) that allow the joint discovery of the experimental behavior and the underpinning physical model. This approach is verified with simulated datasets with known ground truth and was found to accelerate the discovery of experimental behavior and the underlying physical model. The d/c-sGP approach utilizes a few calculated thin film bandgap data points to guide targeted explorations, minimizing the number of thin film preparation steps. Through iterative exploration, we demonstrate that the c-sGP algorithm that combined 5 bandgap models converges rapidly, revealing a relationship in the bandgap diagram of MA1−xGAxPb(I1−xBrx)3. This approach offers a promising method for efficiently understanding the physical model of band gap concentration dependence in binary systems, and this method can also be extended to ternary or higher dimensional systems.
{"title":"Physics-driven discovery and bandgap engineering of hybrid perovskites†","authors":"Sheryl L. Sanchez, Elham Foadian, Maxim Ziatdinov, Jonghee Yang, Sergei V. Kalinin, Yongtao Liu and Mahshid Ahmadi","doi":"10.1039/D4DD00080C","DOIUrl":"10.1039/D4DD00080C","url":null,"abstract":"<p >The unique aspect of hybrid perovskites is their tunability, allowing the engineering of the bandgap <em>via</em> substitution. From the application viewpoint, this allows creation of tandem cells between perovskites and silicon, or two or more perovskites, with associated increase of efficiency beyond the single-junction Shockley–Queisser limit. However, the concentration dependence of the optical bandgap in hybrid perovskite solid solutions can be non-linear and even non-monotonic, as determined by band alignments between endmembers, presence of defect states and Urbach tails, and phase separation. Exploring new compositions brings forth the joint problem of the discovery of the composition with the desired band gap and establishing the physical model of the band gap concentration dependence. Here we report the development of the experimental workflow based on structured Gaussian Process (sGP) models and custom sGP (c-sGP) that allow the joint discovery of the experimental behavior and the underpinning physical model. This approach is verified with simulated datasets with known ground truth and was found to accelerate the discovery of experimental behavior and the underlying physical model. The d/c-sGP approach utilizes a few calculated thin film bandgap data points to guide targeted explorations, minimizing the number of thin film preparation steps. Through iterative exploration, we demonstrate that the c-sGP algorithm that combined 5 bandgap models converges rapidly, revealing a relationship in the bandgap diagram of MA<small><sub>1−<em>x</em></sub></small>GA<small><sub><em>x</em></sub></small>Pb(I<small><sub>1−<em>x</em></sub></small>Br<small><sub><em>x</em></sub></small>)<small><sub>3</sub></small>. This approach offers a promising method for efficiently understanding the physical model of band gap concentration dependence in binary systems, and this method can also be extended to ternary or higher dimensional systems.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00080c?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141501241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shengli Jiang, Shiyi Qin, Reid C. Van Lehn, Prasanna Balaprakash and Victor M. Zavala
Graph Neural Networks (GNNs) have emerged as a prominent class of data-driven methods for molecular property prediction. However, a key limitation of typical GNN models is their inability to quantify uncertainties in the predictions. This capability is crucial for ensuring the trustworthy use and deployment of models in downstream tasks. To that end, we introduce AutoGNNUQ, an automated uncertainty quantification (UQ) approach for molecular property prediction. AutoGNNUQ leverages architecture search to generate an ensemble of high-performing GNNs, enabling the estimation of predictive uncertainties. Our approach employs variance decomposition to separate data (aleatoric) and model (epistemic) uncertainties, providing valuable insights for reducing them. In our computational experiments, we demonstrate that AutoGNNUQ outperforms existing UQ methods in terms of both prediction accuracy and UQ performance on multiple benchmark datasets, and generalizes well to out-of-distribution datasets. Additionally, we utilize t-SNE visualization to explore correlations between molecular features and uncertainty, offering insight for dataset improvement. AutoGNNUQ has broad applicability in domains such as drug discovery and materials science, where accurate uncertainty quantification is crucial for decision-making.
{"title":"Uncertainty quantification for molecular property predictions with graph neural architecture search†","authors":"Shengli Jiang, Shiyi Qin, Reid C. Van Lehn, Prasanna Balaprakash and Victor M. Zavala","doi":"10.1039/D4DD00088A","DOIUrl":"10.1039/D4DD00088A","url":null,"abstract":"<p >Graph Neural Networks (GNNs) have emerged as a prominent class of data-driven methods for molecular property prediction. However, a key limitation of typical GNN models is their inability to quantify uncertainties in the predictions. This capability is crucial for ensuring the trustworthy use and deployment of models in downstream tasks. To that end, we introduce AutoGNNUQ, an automated uncertainty quantification (UQ) approach for molecular property prediction. AutoGNNUQ leverages architecture search to generate an ensemble of high-performing GNNs, enabling the estimation of predictive uncertainties. Our approach employs variance decomposition to separate data (aleatoric) and model (epistemic) uncertainties, providing valuable insights for reducing them. In our computational experiments, we demonstrate that AutoGNNUQ outperforms existing UQ methods in terms of both prediction accuracy and UQ performance on multiple benchmark datasets, and generalizes well to out-of-distribution datasets. Additionally, we utilize t-SNE visualization to explore correlations between molecular features and uncertainty, offering insight for dataset improvement. AutoGNNUQ has broad applicability in domains such as drug discovery and materials science, where accurate uncertainty quantification is crucial for decision-making.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141501243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The identification of materials with exceptional properties is an essential objective to enable technological progress. We propose the application of Quality-Diversity algorithms to the field of crystal structure prediction. The objective of these algorithms is to identify a diverse set of high-performing solutions, which has been successful in a range of fields such as robotics, architecture and aeronautical engineering. As these methods rely on a high number of evaluations, we employ machine-learning surrogate models to compute the interatomic potential and material properties that are used to guide optimisation. Consequently, we also show the value of using neural networks to model crystal properties and enable the identification of novel composition–structure combinations. In this work, we specifically study the application of the MAP-Elites algorithm to predict polymorphs of TiO2. We rediscover the known ground state, in addition to a set of other polymorphs with distinct properties. We validate our method for C, SiO2 and SiC systems, where we show that the algorithm can uncover multiple local minima with distinct electronic and mechanical properties.
识别具有特殊性能的材料是实现技术进步的一个基本目标。我们建议将质量多样性算法应用于晶体结构预测领域。这些算法的目标是识别出一系列不同的高性能解决方案,这在机器人、建筑和航空工程等一系列领域都取得了成功。由于这些方法依赖于大量的评估,因此我们采用机器学习代用模型来计算原子间势能和材料特性,用于指导优化。因此,我们还展示了使用神经网络建立晶体属性模型的价值,并能识别新的成分结构组合。在这项工作中,我们特别研究了如何应用 MAP-Elites 算法预测二氧化钛的多晶体。我们重新发现了已知的基态,以及一系列具有独特性质的其他多晶体。我们对 C、SiO2 和 SiC 系统进行了验证,结果表明该算法可以发现具有不同电子和机械特性的多个局部最小值。
{"title":"Illuminating the property space in crystal structure prediction using Quality-Diversity algorithms†","authors":"Marta Wolinska, Aron Walsh and Antoine Cully","doi":"10.1039/D4DD00054D","DOIUrl":"10.1039/D4DD00054D","url":null,"abstract":"<p >The identification of materials with exceptional properties is an essential objective to enable technological progress. We propose the application of Quality-Diversity algorithms to the field of crystal structure prediction. The objective of these algorithms is to identify a diverse set of high-performing solutions, which has been successful in a range of fields such as robotics, architecture and aeronautical engineering. As these methods rely on a high number of evaluations, we employ machine-learning surrogate models to compute the interatomic potential and material properties that are used to guide optimisation. Consequently, we also show the value of using neural networks to model crystal properties and enable the identification of novel composition–structure combinations. In this work, we specifically study the application of the MAP-Elites algorithm to predict polymorphs of TiO<small><sub>2</sub></small>. We rediscover the known ground state, in addition to a set of other polymorphs with distinct properties. We validate our method for C, SiO<small><sub>2</sub></small> and SiC systems, where we show that the algorithm can uncover multiple local minima with distinct electronic and mechanical properties.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00054d?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141501244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sergio Pablo-García, Raúl Pérez-Soto, Albert Sabadell-Rendón, Diego Garay-Ruiz, Vladyslav Nosylevskyi and Núria López
In the study of chemical processes, visualizing reaction networks is pivotal for identifying crucial compounds and transformations. Traditional methods, such as network schematics and reaction path linear plots, often struggle to effectively represent complex reaction networks due to their size and intricate connectivity. Alternatives capable of leading with complexity include graph methods, but they are not user-friendly, lacking simplicity and modularity, which hinders their integration with widely-used research software. This work introduces rNets an innovative tool designed for the efficient visualization of reaction networks with a user-friendly interface, modularity, and seamless integration with existing software packages. The effectiveness of rNets is demonstrated through its application in analyzing three catalytic reactions, showcasing its potential to significantly enhance research both in homogeneous and heterogeneous catalysis fields. This tool not only simplifies the visualization process but also opens new avenues for exploring complex reaction networks in diverse research contexts.
{"title":"rNets: a standalone package to visualize reaction networks†","authors":"Sergio Pablo-García, Raúl Pérez-Soto, Albert Sabadell-Rendón, Diego Garay-Ruiz, Vladyslav Nosylevskyi and Núria López","doi":"10.1039/D4DD00087K","DOIUrl":"10.1039/D4DD00087K","url":null,"abstract":"<p >In the study of chemical processes, visualizing reaction networks is pivotal for identifying crucial compounds and transformations. Traditional methods, such as network schematics and reaction path linear plots, often struggle to effectively represent complex reaction networks due to their size and intricate connectivity. Alternatives capable of leading with complexity include graph methods, but they are not user-friendly, lacking simplicity and modularity, which hinders their integration with widely-used research software. This work introduces rNets an innovative tool designed for the efficient visualization of reaction networks with a user-friendly interface, modularity, and seamless integration with existing software packages. The effectiveness of rNets is demonstrated through its application in analyzing three catalytic reactions, showcasing its potential to significantly enhance research both in homogeneous and heterogeneous catalysis fields. This tool not only simplifies the visualization process but also opens new avenues for exploring complex reaction networks in diverse research contexts.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00087k?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141524997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Understanding how materials melt is crucial for their practical applications and development, thereby enabling us to predict their behavior in real-world environmental conditions. Accurate computation of melting temperatures (Tm) has been a long-standing pursuit involving various methods for classical potentials and first-principles calculations. However, finding literature Tm references for many elements using a clearly defined set of calculation parameters is rare. Herein we apply deep neural network atomistic potentials (DNPs), trained on density functional theory (DFT) generated datasets, to describe the melting temperature of 20 single-element materials across the Periodic Table using large-scale molecular dynamics simulations. Our results demonstrate high-fidelity with experimental observations and also with calculated reference melting temperatures, yielding an average deviation of less than 18%. We propose a straightforward elemental-group-specific relationship between Tm and cohesive energy for these calculated references to provide reliable DFT specific reference points, which we believe can be readily applied to many materials. Additionally, we compare DNP predictions for three representative elements at external pressures up to 30 GPa in molecular dynamics simulations, revealing reasonable consistency with experimental and DFT literature references despite the lack of explicit training at these high pressures. This work further extends our flexible approach to developing and modifying DNPs to create unique atomistic potentials tailored to describe atomically complex materials under extreme environmental conditions.
{"title":"Predicting melting temperatures across the periodic table with machine learning atomistic potentials†","authors":"Christopher M. Andolina and Wissam A. Saidi","doi":"10.1039/D4DD00069B","DOIUrl":"10.1039/D4DD00069B","url":null,"abstract":"<p >Understanding how materials melt is crucial for their practical applications and development, thereby enabling us to predict their behavior in real-world environmental conditions. Accurate computation of melting temperatures (<em>T</em><small><sub>m</sub></small>) has been a long-standing pursuit involving various methods for classical potentials and first-principles calculations. However, finding literature <em>T</em><small><sub>m</sub></small> references for many elements using a clearly defined set of calculation parameters is rare. Herein we apply deep neural network atomistic potentials (DNPs), trained on density functional theory (DFT) generated datasets, to describe the melting temperature of 20 single-element materials across the Periodic Table using large-scale molecular dynamics simulations. Our results demonstrate high-fidelity with experimental observations and also with calculated reference melting temperatures, yielding an average deviation of less than 18%. We propose a straightforward elemental-group-specific relationship between <em>T</em><small><sub>m</sub></small> and cohesive energy for these calculated references to provide reliable DFT specific reference points, which we believe can be readily applied to many materials. Additionally, we compare DNP predictions for three representative elements at external pressures up to 30 GPa in molecular dynamics simulations, revealing reasonable consistency with experimental and DFT literature references despite the lack of explicit training at these high pressures. This work further extends our flexible approach to developing and modifying DNPs to create unique atomistic potentials tailored to describe atomically complex materials under extreme environmental conditions.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00069b?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141501242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zygimantas Jocys, Joanna Grundy and Katayoun Farrahi
Molecule generation in 3D space has gained attention in the past few years. These models typically have a hypothesis that they need to satisfy (i.e. shape) or they are designed to fit into a protein pocket. However, there's been limited evaluation of the 3D poses they produce. In the previous work, the generated molecules are redocked and the generated poses are disregarded. Moreover, many of the generated molecules are not synthesisable and druglike. To tackle these challenges we propose DrugPose, a novel benchmark framework, that utilises Simbind to evaluate the generated molecules based on their coherence with the initial hypothesis formed from available data (e.g., active compounds and protein structures) and their adherence to the laws of physics. Moreover, it offers enhanced insights into synthesizability by directly cross-referencing with a commercial database and utilising the Ghose filter for assessing drug-likeness. Considering current generative methods, the percentage of generated molecules with the intended binding mode ranges from 4.7% to 15.9%, with commercial accessibility spanning 23.6% to 38.8% and fully satisfying the Ghose filter between 10% and 40%. These results highlight the need for further research to develop more reliable and transparent methodologies for 3D molecule generation.
{"title":"DrugPose: benchmarking 3D generative methods for early stage drug discovery","authors":"Zygimantas Jocys, Joanna Grundy and Katayoun Farrahi","doi":"10.1039/D4DD00076E","DOIUrl":"10.1039/D4DD00076E","url":null,"abstract":"<p >Molecule generation in 3D space has gained attention in the past few years. These models typically have a hypothesis that they need to satisfy (<em>i.e.</em> shape) or they are designed to fit into a protein pocket. However, there's been limited evaluation of the 3D poses they produce. In the previous work, the generated molecules are redocked and the generated poses are disregarded. Moreover, many of the generated molecules are not synthesisable and druglike. To tackle these challenges we propose DrugPose, a novel benchmark framework, that utilises Simbind to evaluate the generated molecules based on their coherence with the initial hypothesis formed from available data (<em>e.g.</em>, active compounds and protein structures) and their adherence to the laws of physics. Moreover, it offers enhanced insights into synthesizability by directly cross-referencing with a commercial database and utilising the Ghose filter for assessing drug-likeness. Considering current generative methods, the percentage of generated molecules with the intended binding mode ranges from 4.7% to 15.9%, with commercial accessibility spanning 23.6% to 38.8% and fully satisfying the Ghose filter between 10% and 40%. These results highlight the need for further research to develop more reliable and transparent methodologies for 3D molecule generation.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00076e?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141501272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thomas M. Dixon, Jeanine Williams, Maximilian Besenhard, Roger M. Howard, James MacGregor, Philip Peach, Adam D. Clayton, Nicholas J. Warren and Richard A. Bourne
The need to efficiently develop high performance liquid chromatography (HPLC) methods, whilst adhering to quality by design principles is of paramount importance when it comes to impurity detection in the synthesis of active pharmaceutical ingredients. This study highlights a novel approach that fully automates HPLC method development using black-box single and multi-objective Bayesian optimization algorithms. Three continuous variables including the initial isocratic hold time, initial organic modifier concentration and the gradient time were adjusted to simultaneously optimize the number of peaks detected, the resolution between peaks and the method length. Two mixtures of analytes, one with seven compounds and one with eleven compounds, were investigated. The system explored the design space to find a global optimum in chromatogram quality without human assistance, and methods that gave baseline resolution were identified. Optimal operating conditions were typically reached within just 13 experiments. The single and multi-objective Bayesian optimization algorithms were compared to show that multi-objective optimization was more suitable for HPLC method development. This allowed for multiple chromatogram acceptance criteria to be selected without having to repeat the entire optimization, making it a useful tool for robustness testing. Work in this paper presents a fully “operator-free” and closed loop HPLC method optimization process that can find optimal methods quickly when compared to other modern HPLC optimization techniques such as design of experiments, linear solvent strength models or quantitative structure retention relationships.
{"title":"Operator-free HPLC automated method development guided by Bayesian optimization†","authors":"Thomas M. Dixon, Jeanine Williams, Maximilian Besenhard, Roger M. Howard, James MacGregor, Philip Peach, Adam D. Clayton, Nicholas J. Warren and Richard A. Bourne","doi":"10.1039/D4DD00062E","DOIUrl":"10.1039/D4DD00062E","url":null,"abstract":"<p >The need to efficiently develop high performance liquid chromatography (HPLC) methods, whilst adhering to quality by design principles is of paramount importance when it comes to impurity detection in the synthesis of active pharmaceutical ingredients. This study highlights a novel approach that fully automates HPLC method development using black-box single and multi-objective Bayesian optimization algorithms. Three continuous variables including the initial isocratic hold time, initial organic modifier concentration and the gradient time were adjusted to simultaneously optimize the number of peaks detected, the resolution between peaks and the method length. Two mixtures of analytes, one with seven compounds and one with eleven compounds, were investigated. The system explored the design space to find a global optimum in chromatogram quality without human assistance, and methods that gave baseline resolution were identified. Optimal operating conditions were typically reached within just 13 experiments. The single and multi-objective Bayesian optimization algorithms were compared to show that multi-objective optimization was more suitable for HPLC method development. This allowed for multiple chromatogram acceptance criteria to be selected without having to repeat the entire optimization, making it a useful tool for robustness testing. Work in this paper presents a fully “operator-free” and closed loop HPLC method optimization process that can find optimal methods quickly when compared to other modern HPLC optimization techniques such as design of experiments, linear solvent strength models or quantitative structure retention relationships.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00062e?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141524882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}