This Faraday Discussion explored the field of NMR crystallography, and considered recent developments in experimental and theoretical approaches, new advances in machine learning and in the generation and handling of large amounts of data. Applications to a wide range of disordered, amorphous and dynamic systems demonstrated the range and quality of information available from this approach and the challenges that are faced in exploiting automation and developing best practice. In these closing remarks I will reflect on the discussions on the current state of the art, questions about what we want from these studies, how accurate we need results to be, how we best generate models for complex materials and what machine learning approaches can offer. These remarks close with thoughts about the future direction of the field, who will be carrying out this type of research, how they might be doing it and what their focus will be, along with likely possible challenges and opportunities.
{"title":"Concluding remarks: <i>Faraday Discussion</i> on NMR crystallography.","authors":"Sharon E Ashbrook","doi":"10.1039/d4fd00155a","DOIUrl":"10.1039/d4fd00155a","url":null,"abstract":"<p><p>This <i>Faraday Discussion</i> explored the field of NMR crystallography, and considered recent developments in experimental and theoretical approaches, new advances in machine learning and in the generation and handling of large amounts of data. Applications to a wide range of disordered, amorphous and dynamic systems demonstrated the range and quality of information available from this approach and the challenges that are faced in exploiting automation and developing best practice. In these closing remarks I will reflect on the discussions on the current state of the art, questions about what we want from these studies, how accurate we need results to be, how we best generate models for complex materials and what machine learning approaches can offer. These remarks close with thoughts about the future direction of the field, who will be carrying out this type of research, how they might be doing it and what their focus will be, along with likely possible challenges and opportunities.</p>","PeriodicalId":76,"journal":{"name":"Faraday Discussions","volume":" ","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142453621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tianfan Jin, Veerupaksh Singla, Hsuan-Hao Hsu, Brett M Savoie
Generative models for the inverse design of molecules with particular properties have been heavily hyped, but have yet to demonstrate significant gains over machine-learning-augmented expert intuition. A major challenge of such models is their limited accuracy in predicting molecules with targeted properties in the data-scarce regime, which is the regime typical of the prized outliers that it is hoped inverse models will discover. For example, activity data for a drug target or stability data for a material may only number in the tens to hundreds of samples, which is insufficient to learn an accurate and reasonably general property-to-structure inverse mapping from scratch. We've hypothesized that the property-to-structure mapping becomes unique when a sufficient number of properties are supplied to the models during training. This hypothesis has several important corollaries if true. It would imply that data-scarce properties can be completely determined using a set of more accessible molecular properties. It would also imply that a generative model trained on multiple properties would exhibit an accuracy phase transition after achieving a sufficient size-a process analogous to what has been observed in the context of large language models. To interrogate these behaviors, we have built the first transformers trained on the property-to-molecular-graph task, which we dub "large property models" (LPMs). A key ingredient is supplementing these models during training with relatively basic but abundant chemical property data. The motivation for the large-property-model paradigm, the model architectures, and case studies are presented here.
{"title":"Large property models: a new generative machine-learning formulation for molecules.","authors":"Tianfan Jin, Veerupaksh Singla, Hsuan-Hao Hsu, Brett M Savoie","doi":"10.1039/d4fd00113c","DOIUrl":"https://doi.org/10.1039/d4fd00113c","url":null,"abstract":"<p><p>Generative models for the inverse design of molecules with particular properties have been heavily hyped, but have yet to demonstrate significant gains over machine-learning-augmented expert intuition. A major challenge of such models is their limited accuracy in predicting molecules with targeted properties in the data-scarce regime, which is the regime typical of the prized outliers that it is hoped inverse models will discover. For example, activity data for a drug target or stability data for a material may only number in the tens to hundreds of samples, which is insufficient to learn an accurate and reasonably general property-to-structure inverse mapping from scratch. We've hypothesized that the property-to-structure mapping becomes unique when a sufficient number of properties are supplied to the models during training. This hypothesis has several important corollaries if true. It would imply that data-scarce properties can be completely determined using a set of more accessible molecular properties. It would also imply that a generative model trained on multiple properties would exhibit an accuracy phase transition after achieving a sufficient size-a process analogous to what has been observed in the context of large language models. To interrogate these behaviors, we have built the first transformers trained on the property-to-molecular-graph task, which we dub \"large property models\" (LPMs). A key ingredient is supplementing these models during training with relatively basic but abundant chemical property data. The motivation for the large-property-model paradigm, the model architectures, and case studies are presented here.</p>","PeriodicalId":76,"journal":{"name":"Faraday Discussions","volume":" ","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142805623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Machine learning has gained popularity for predicting molecular properties based on molecular structure. This study explores the uncertainty estimates of neural fingerprint-based models by comparing pure graph neural networks (GNN) to classical machine learning algorithms combined with neural fingerprints. We investigate the advantage of extracting the neural fingerprint from the GNN and integrating it into a method known for producing better-calibrated probability estimates. Comparisons are made using three classical machine learning methods and the Chemprop model, considering different molecular representations and calibration techniques. We utilize 19 datasets from Toxcast, reflecting real-world scenarios with balanced accuracies ranging from 0.6 to 0.8. Results demonstrate that neural fingerprints combined with classical machine learning methods exhibit a slight decrease in prediction performance compared to the native Chemprop model. However, these models provide significantly improved uncertainty estimates. Notably, uncertainty estimates of neural fingerprint-based methods remain relatively robust for molecules dissimilar to the training set. This suggests that methods like random forest with neural fingerprints can deliver strong prediction performance and reliable uncertainty estimates. When considering both performance and uncertainty, the calibrated Chemprop model and the combination of neural fingerprints with random forest or support vector classifier (SVC) yield comparable results. Surprisingly, the SVC method shows promising performance when combined with neural or count fingerprints. These findings are particularly relevant in real-world industrial projects where accurate predictions and reliable uncertainty estimates are crucial.
{"title":"Analysis of uncertainty of neural fingerprint-based models.","authors":"Christian W Feldmann, Jochen Sieg, Miriam Mathea","doi":"10.1039/d4fd00095a","DOIUrl":"10.1039/d4fd00095a","url":null,"abstract":"<p><p>Machine learning has gained popularity for predicting molecular properties based on molecular structure. This study explores the uncertainty estimates of neural fingerprint-based models by comparing pure graph neural networks (GNN) to classical machine learning algorithms combined with neural fingerprints. We investigate the advantage of extracting the neural fingerprint from the GNN and integrating it into a method known for producing better-calibrated probability estimates. Comparisons are made using three classical machine learning methods and the Chemprop model, considering different molecular representations and calibration techniques. We utilize 19 datasets from Toxcast, reflecting real-world scenarios with balanced accuracies ranging from 0.6 to 0.8. Results demonstrate that neural fingerprints combined with classical machine learning methods exhibit a slight decrease in prediction performance compared to the native Chemprop model. However, these models provide significantly improved uncertainty estimates. Notably, uncertainty estimates of neural fingerprint-based methods remain relatively robust for molecules dissimilar to the training set. This suggests that methods like random forest with neural fingerprints can deliver strong prediction performance and reliable uncertainty estimates. When considering both performance and uncertainty, the calibrated Chemprop model and the combination of neural fingerprints with random forest or support vector classifier (SVC) yield comparable results. Surprisingly, the SVC method shows promising performance when combined with neural or count fingerprints. These findings are particularly relevant in real-world industrial projects where accurate predictions and reliable uncertainty estimates are crucial.</p>","PeriodicalId":76,"journal":{"name":"Faraday Discussions","volume":" ","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142337476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sarah L Ko, Jordan A Dorrell, Andrew J Morris, Kent J Griffith
Lithium-rich early transition metal oxides are the source of excess removeable lithium that affords high energy density to lithium-rich battery cathodes. They are also candidates for solid electrolytes in all-solid-state batteries. These highly ionic compounds are sparse on phase diagrams of thermodynamically stable oxides, but soft chemical routes offer an alternative to explore new alkali-rich crystal chemistries. In this work, a new layered polymorph of Li3NbO4 with coplanar [Nb4O16]12- clusters is discovered through ion exchange chemistry. A more detailed study of the ion exchange reaction reveals that it takes place almost instantaneously, changing the crystal volume by more than 22% within seconds. The transformation of coplanar [Nb4O16]12- in L-Li3NbO4 into the supertetrahedral [Nb4O16]12- clusters found in the stable cubic c-Li3NbO4 is also explored. Furthermore, this synthetic pathway is extended to access a new layered polymorph of Li3TaO4. NMR crystallography with 6,7Li, 23Na, and 93Nb NMR, X-ray diffraction, neutron diffraction, and first-principles calculations is applied to A3MO4 (A = Li, Na; M = Nb, Ta) to identify local and long-range atomic structure, to monitor the unusually rapid reaction progression, and to track the phase transitions from the metastable layered phases to the known compounds found using high-temperature synthesis. A mechanism is proposed whereby some sodium is retained at short reaction times, which then undergoes proton exchange during water washing, forming a phase with hydrogen bonds bridging the coplanar [Nb4O16]12- clusters. This study has implications for lithium-rich transition metal oxides and associated battery materials and for ion exchange chemistry in non-framework structures. The role of techniques that can detect light elements, local structure, and subtle structural changes in soft-chemical synthesis is emphasized.
{"title":"Metastable layered lithium-rich niobium and tantalum oxides <i>via</i> nearly instantaneous cation exchange.","authors":"Sarah L Ko, Jordan A Dorrell, Andrew J Morris, Kent J Griffith","doi":"10.1039/d4fd00103f","DOIUrl":"10.1039/d4fd00103f","url":null,"abstract":"<p><p>Lithium-rich early transition metal oxides are the source of excess removeable lithium that affords high energy density to lithium-rich battery cathodes. They are also candidates for solid electrolytes in all-solid-state batteries. These highly ionic compounds are sparse on phase diagrams of thermodynamically stable oxides, but soft chemical routes offer an alternative to explore new alkali-rich crystal chemistries. In this work, a new layered polymorph of Li<sub>3</sub>NbO<sub>4</sub> with coplanar [Nb<sub>4</sub>O<sub>16</sub>]<sup>12-</sup> clusters is discovered through ion exchange chemistry. A more detailed study of the ion exchange reaction reveals that it takes place almost instantaneously, changing the crystal volume by more than 22% within seconds. The transformation of coplanar [Nb<sub>4</sub>O<sub>16</sub>]<sup>12-</sup> in L-Li<sub>3</sub>NbO<sub>4</sub> into the supertetrahedral [Nb<sub>4</sub>O<sub>16</sub>]<sup>12-</sup> clusters found in the stable cubic c-Li<sub>3</sub>NbO<sub>4</sub> is also explored. Furthermore, this synthetic pathway is extended to access a new layered polymorph of Li<sub>3</sub>TaO<sub>4</sub>. NMR crystallography with <sup>6,7</sup>Li, <sup>23</sup>Na, and <sup>93</sup>Nb NMR, X-ray diffraction, neutron diffraction, and first-principles calculations is applied to A<sub>3</sub>MO<sub>4</sub> (A = Li, Na; M = Nb, Ta) to identify local and long-range atomic structure, to monitor the unusually rapid reaction progression, and to track the phase transitions from the metastable layered phases to the known compounds found using high-temperature synthesis. A mechanism is proposed whereby some sodium is retained at short reaction times, which then undergoes proton exchange during water washing, forming a phase with hydrogen bonds bridging the coplanar [Nb<sub>4</sub>O<sub>16</sub>]<sup>12-</sup> clusters. This study has implications for lithium-rich transition metal oxides and associated battery materials and for ion exchange chemistry in non-framework structures. The role of techniques that can detect light elements, local structure, and subtle structural changes in soft-chemical synthesis is emphasized.</p>","PeriodicalId":76,"journal":{"name":"Faraday Discussions","volume":" ","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142277402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alan Aspuru-Guzik, Austin Cheng, Marta Skreta, Cher Tian Ser, Andres Guzman-Cordero, Luca Thiede, Andreas Burger, Sergio Pablo-García, Abdulrahman Aldossary, Shi Xuan Leong, Felix Strieth-Kalthoff
Machine learning has been pervasively touching many fields of science. Chemistry and materials science are no exception. While machine learning has been making a great impact, it is still not reaching its full potential or maturity. In this perspective, we first outline the pervasive current applications. Then, we discuss how machine learning researchers view and approach problems in the field. Finally, we provide our considerations for maximizing impact when researching machine learning for chemistry.
{"title":"How to do impactful research in artificial intelligence for chemistry and materials science.","authors":"Alan Aspuru-Guzik, Austin Cheng, Marta Skreta, Cher Tian Ser, Andres Guzman-Cordero, Luca Thiede, Andreas Burger, Sergio Pablo-García, Abdulrahman Aldossary, Shi Xuan Leong, Felix Strieth-Kalthoff","doi":"10.1039/d4fd00153b","DOIUrl":"https://doi.org/10.1039/d4fd00153b","url":null,"abstract":"Machine learning has been pervasively touching many fields of science. Chemistry and materials science are no exception. While machine learning has been making a great impact, it is still not reaching its full potential or maturity. In this perspective, we first outline the pervasive current applications. Then, we discuss how machine learning researchers view and approach problems in the field. Finally, we provide our considerations for maximizing impact when researching machine learning for chemistry.","PeriodicalId":76,"journal":{"name":"Faraday Discussions","volume":"32 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chemical function is directly related to the spatial arrangement of atoms. Consequently, the determination of atomic-level three-dimensional structures has transformed molecular and materials science over the past 60 years. In this context, solid-state NMR has emerged to become the method of choice for atomic-level characterization of complex materials in powder form. In the following we present an overview of current methods for chemical shift driven NMR crystallography, illustrated with applications to complex materials
{"title":"NMR Crystallography","authors":"Lyndon Emsley","doi":"10.1039/d4fd00151f","DOIUrl":"https://doi.org/10.1039/d4fd00151f","url":null,"abstract":"Chemical function is directly related to the spatial arrangement of atoms. Consequently, the determination of atomic-level three-dimensional structures has transformed molecular and materials science over the past 60 years. In this context, solid-state NMR has emerged to become the method of choice for atomic-level characterization of complex materials in powder form. In the following we present an overview of current methods for chemical shift driven NMR crystallography, illustrated with applications to complex materials","PeriodicalId":76,"journal":{"name":"Faraday Discussions","volume":"262 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142208830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gerd Blanke, Jan Brammer, Djordje Baljozovic, Nauman Khan, Frank Lange, Felix Bänsch, Clare A. Tovee, Ulrich Schatzschneider, Richard M Hartshorn, Sonja Herres-Pawlis
The InChI (International Chemical Identifier) standard stands as a cornerstone in chemical informatics, facilitating the structure-based identification and exchange of chemical compounds across various platforms and databases. The InChI as a unique canonical line notation has made chemical structures searchable on the internet at a broad scale. The largest repositories working with InChIs contain more than 1 billion structures. Central to the functionality of the InChI is its codebase, which orchestrates a series of intricate steps to generate unique identifiers for chemical compounds. Up to now, these steps have been sparsely documented and the InChI algorithm had to be seen as a black box. For the new v1.07 release, the code has been analyzed and the major steps documented, more than 3000 bugs and security issues, as well as nearly 60 Google OSS-Fuzz issues have been fixed. New test systems have been implemented that allow users to directly test the code developments. The move to GitHub has not only made the development more transparent but will also enable external contributors to join the further development of the InChI code. Motivation for this modernisation was the urgency to treat molecular inorganic compounds by the InChI in a meaningful way. Until now, no classic string representation fulfills this need of molecular inorganic chemistry. The connection of metal bonds is by definition disconnected which makes most inorganic InChIs meaningless at the moment. Herein, we propose new routines to remedy this problem in the representation of molecular inorganic compounds by the InChI.
{"title":"Making the InChI FAIR and sustainable while moving to Inorganics","authors":"Gerd Blanke, Jan Brammer, Djordje Baljozovic, Nauman Khan, Frank Lange, Felix Bänsch, Clare A. Tovee, Ulrich Schatzschneider, Richard M Hartshorn, Sonja Herres-Pawlis","doi":"10.1039/d4fd00145a","DOIUrl":"https://doi.org/10.1039/d4fd00145a","url":null,"abstract":"The InChI (International Chemical Identifier) standard stands as a cornerstone in chemical informatics, facilitating the structure-based identification and exchange of chemical compounds across various platforms and databases. The InChI as a unique canonical line notation has made chemical structures searchable on the internet at a broad scale. The largest repositories working with InChIs contain more than 1 billion structures. Central to the functionality of the InChI is its codebase, which orchestrates a series of intricate steps to generate unique identifiers for chemical compounds. Up to now, these steps have been sparsely documented and the InChI algorithm had to be seen as a black box. For the new v1.07 release, the code has been analyzed and the major steps documented, more than 3000 bugs and security issues, as well as nearly 60 Google OSS-Fuzz issues have been fixed. New test systems have been implemented that allow users to directly test the code developments. The move to GitHub has not only made the development more transparent but will also enable external contributors to join the further development of the InChI code. Motivation for this modernisation was the urgency to treat molecular inorganic compounds by the InChI in a meaningful way. Until now, no classic string representation fulfills this need of molecular inorganic chemistry. The connection of metal bonds is by definition disconnected which makes most inorganic InChIs meaningless at the moment. Herein, we propose new routines to remedy this problem in the representation of molecular inorganic compounds by the InChI.","PeriodicalId":76,"journal":{"name":"Faraday Discussions","volume":"18 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142208832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ion transport through biological channels is influenced not only by the structural properties of the channels themselves but also by the composition of the phospholipid membrane, which acts as a scaffold for these nanochannels. Drawing inspiration from how lipid membrane composition modulates ion currents, as seen in the activation of the K+ channel in Streptomyces A (KcsA) by anionic lipids, we propose a biomimetic nanochannel system that integrates DNA nanotechnology with two-dimensional graphene oxide (GO) nanosheets. By modifying the length of the multibranched DNA nanowires generated through the hybridization chain reactions (HCR) and varying the concentration of the linker strands that integrate these DNA nanowire structures with the GO membrane, the composition of the membrane can be effectively adjusted, consequently impacting ion transport. This method provides a strategy for developing devices with highly efficient and tunable ion transport, suitable for applications in mass transport, environmental protection, biomimetic channels, and biosensors.
离子通过生物通道的传输不仅受通道本身结构特性的影响,还受磷脂膜成分的影响,磷脂膜是这些纳米通道的支架。从阴离子脂质激活链霉菌 A 的 K+ 通道(KcsA)的过程中,我们从脂质膜成分如何调节离子电流中汲取了灵感,提出了一种将 DNA 纳米技术与二维氧化石墨烯(GO)纳米片相结合的仿生纳米通道系统。通过改变杂交链反应(HCR)产生的多分支 DNA 纳米线的长度,以及改变将这些 DNA 纳米线结构与 GO 膜结合在一起的连接链的浓度,可以有效调整膜的组成,从而影响离子传输。这种方法为开发具有高效和可调离子传输功能的设备提供了一种策略,适用于质量传输、环境保护、仿生通道和生物传感器等应用领域。
{"title":"Regulation of Transmembrane Current through Modulation of Biomimetic Lipid Membrane Composition","authors":"Zhiwei Shang, Jing Zhao, Mengyu Yang, Yuling Xiao, Wenjing Chu, Yilin Cai, Xiaoqing Yi, Meihua Lin, Fan Xia","doi":"10.1039/d4fd00149d","DOIUrl":"https://doi.org/10.1039/d4fd00149d","url":null,"abstract":"Ion transport through biological channels is influenced not only by the structural properties of the channels themselves but also by the composition of the phospholipid membrane, which acts as a scaffold for these nanochannels. Drawing inspiration from how lipid membrane composition modulates ion currents, as seen in the activation of the K+ channel in Streptomyces A (KcsA) by anionic lipids, we propose a biomimetic nanochannel system that integrates DNA nanotechnology with two-dimensional graphene oxide (GO) nanosheets. By modifying the length of the multibranched DNA nanowires generated through the hybridization chain reactions (HCR) and varying the concentration of the linker strands that integrate these DNA nanowire structures with the GO membrane, the composition of the membrane can be effectively adjusted, consequently impacting ion transport. This method provides a strategy for developing devices with highly efficient and tunable ion transport, suitable for applications in mass transport, environmental protection, biomimetic channels, and biosensors.","PeriodicalId":76,"journal":{"name":"Faraday Discussions","volume":"7 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142208841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The widespread application of machine learning (ML) to the chemical sciences is making it very important to understand how the ML models learn to correlate chemical structures with their properties, and what can be done to improve the training efficiency whilst guaranteeing interpretability and transferability. In this work, we demonstrate the wide utility of prediction rigidities, a faimily of metrics derived from the loss function, in understanding the robustness of ML model predictions. We show that the prediction rigidities allow the assessment of the model not only at the global level, but also on the local or the component-wise level at which the intermediate (e.g. atomic, body-ordered, or range-separated) predictions are made. We leverage these metrics to understand the learning behavior of different ML models, and to guide efficient dataset construction for model training. We finally implement the formalism for a ML model targeting a coarse-grained system to demonstrate the applicability of the prediction rigidities to an even broader class of atomistic modeling problems.
机器学习(ML)在化学科学领域的广泛应用,使得了解 ML 模型如何学习将化学结构与其性质联系起来,以及如何在保证可解释性和可转移性的同时提高训练效率变得非常重要。在这项工作中,我们展示了预测刚性的广泛实用性,它是由损失函数衍生出的一系列指标,有助于理解 ML 模型预测的鲁棒性。我们表明,预测刚度不仅可以在全局层面对模型进行评估,还可以在局部或组件层面对模型进行评估,而中间预测(如原子、体有序或范围分离)就是在局部或组件层面进行的。我们利用这些指标来了解不同 ML 模型的学习行为,并指导模型训练的高效数据集构建。最后,我们针对粗粒度系统实现了 ML 模型的形式主义,以证明预测刚性适用于更广泛的原子建模问题。
{"title":"Prediction rigidities for data-driven chemistry","authors":"Sanggyu Chong, Filippo Bigi, Federico Grasselli, Philip Loche, Matthias Kellner, Michele Ceriotti","doi":"10.1039/d4fd00101j","DOIUrl":"https://doi.org/10.1039/d4fd00101j","url":null,"abstract":"The widespread application of machine learning (ML) to the chemical sciences is making it very important to understand how the ML models learn to correlate chemical structures with their properties, and what can be done to improve the training efficiency whilst guaranteeing interpretability and transferability. In this work, we demonstrate the wide utility of prediction rigidities, a faimily of metrics derived from the loss function, in understanding the robustness of ML model predictions. We show that the prediction rigidities allow the assessment of the model not only at the global level, but also on the local or the component-wise level at which the intermediate (e.g. atomic, body-ordered, or range-separated) predictions are made. We leverage these metrics to understand the learning behavior of different ML models, and to guide efficient dataset construction for model training. We finally implement the formalism for a ML model targeting a coarse-grained system to demonstrate the applicability of the prediction rigidities to an even broader class of atomistic modeling problems.","PeriodicalId":76,"journal":{"name":"Faraday Discussions","volume":"47 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142208831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The recent developments of scanning electrochemical probe techniques focus on the strategy of scanning electrolyte. For example, scanning electrochemical cell microscopy (SECCM) is based on holding the electrolyte in a glass capillary, while scanning gel electrochemical microscopy (SGECM) immobilizes the gel electrolyte on micro-disk electrodes or etched metal wires. In both SECCM and SGECM, the first and essential step is to approach the electrolyte probe to be in contact with the sample, which is very often achieved by current feedback with a constant applied potential between the probe and the sample. This work attempts to theoretically analyse the deformation of electrolyte during this approaching process. For liquid electrolyte in SECCM, surface tension is considered to counterbalance the gravity and electrostatic force in 2D cylindrical coordinates with axial symmetry. The deformation at equilibrium is solved under certain conditions. For gel electrolyte, a viscoelastic gel is analysed with simplified 1D geometry. Both equilibrium and dynamic approaching are considered. The results suggest that for both liquid and gel electrolytes, critical conditions exist for breaking the equilibrium. When applied potential is higher or the distance is lower than the threshold, the force will not equilibrate and the electrolyte will deform until contact. The critical condition depends on the properties (surface tension for liquid, elastic and viscous modulus for gel) and geometry (radius of capillary for liquid, thickness for gel) of electrolyte. Prospects of further extending the work closer to real experimental scenarios, especially SGECM, are also discussed.
{"title":"Charge induced deformation of scanning electrolyte before contact","authors":"Liang Liu","doi":"10.1039/d4fd00147h","DOIUrl":"https://doi.org/10.1039/d4fd00147h","url":null,"abstract":"The recent developments of scanning electrochemical probe techniques focus on the strategy of scanning electrolyte. For example, scanning electrochemical cell microscopy (SECCM) is based on holding the electrolyte in a glass capillary, while scanning gel electrochemical microscopy (SGECM) immobilizes the gel electrolyte on micro-disk electrodes or etched metal wires. In both SECCM and SGECM, the first and essential step is to approach the electrolyte probe to be in contact with the sample, which is very often achieved by current feedback with a constant applied potential between the probe and the sample. This work attempts to theoretically analyse the deformation of electrolyte during this approaching process. For liquid electrolyte in SECCM, surface tension is considered to counterbalance the gravity and electrostatic force in 2D cylindrical coordinates with axial symmetry. The deformation at equilibrium is solved under certain conditions. For gel electrolyte, a viscoelastic gel is analysed with simplified 1D geometry. Both equilibrium and dynamic approaching are considered. The results suggest that for both liquid and gel electrolytes, critical conditions exist for breaking the equilibrium. When applied potential is higher or the distance is lower than the threshold, the force will not equilibrate and the electrolyte will deform until contact. The critical condition depends on the properties (surface tension for liquid, elastic and viscous modulus for gel) and geometry (radius of capillary for liquid, thickness for gel) of electrolyte. Prospects of further extending the work closer to real experimental scenarios, especially SGECM, are also discussed.","PeriodicalId":76,"journal":{"name":"Faraday Discussions","volume":"17 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142208842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}