首页 > 最新文献

Digital discovery最新文献

英文 中文
Transfer learning based on atomic feature extraction for the prediction of experimental 13C chemical shifts† 基于原子特征提取的迁移学习,用于预测实验 13C 化学位移†。
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-09-20 DOI: 10.1039/D4DD00168K
Žarko Ivković, Jesús Jover and Jeremy Harvey

Forecasting experimental chemical shifts of organic compounds is a long-standing challenge in organic chemistry. Recent advances in machine learning (ML) have led to routines that surpass the accuracy of ab initio Density Functional Theory (DFT) in estimating experimental 13C shifts. The extraction of knowledge from other models, known as transfer learning, has demonstrated remarkable improvements, particularly in scenarios with limited data availability. However, the extent to which transfer learning improves predictive accuracy in low-data regimes for experimental chemical shift predictions remains unexplored. This study indicates that atomic features derived from a message passing neural network (MPNN) forcefield are robust descriptors for atomic properties. A dense network utilizing these descriptors to predict 13C shifts achieves a mean absolute error (MAE) of 1.68 ppm. When these features are used as node labels in a simple graph neural network (GNN), the model attains a better MAE of 1.34 ppm. On the other hand, embeddings from a self-supervised pre-trained 3D aware transformer are not sufficiently descriptive for a feedforward model but show reasonable accuracy within the GNN framework, achieving an MAE of 1.51 ppm. Under low-data conditions, all transfer-learned models show a significant improvement in predictive accuracy compared to existing literature models, regardless of the sampling strategy used to select from the pool of unlabeled examples. We demonstrated that extracting atomic features from models trained on large and diverse datasets is an effective transfer learning strategy for predicting NMR chemical shifts, achieving results on par with existing literature models. This method provides several benefits, such as reduced training times, simpler models with fewer trainable parameters, and strong performance in low-data scenarios, without the need for costly ab initio data of the target property. This technique can be applied to other chemical tasks opening many new potential applications where the amount of data is a limiting factor.

预测有机化合物的实验化学位移是有机化学领域的一项长期挑战。机器学习(ML)技术的最新进展使其在估计实验 13C 化学位移方面的准确性超过了原子核密度函数理论(DFT)。从其他模型中提取知识,即所谓的迁移学习,已显示出显著的改进,尤其是在数据可用性有限的情况下。然而,转移学习能在多大程度上提高实验化学位移预测的低数据条件下的预测准确性仍有待探索。本研究表明,从消息传递神经网络(MPNN)力场中提取的原子特征是原子性质的稳健描述符。利用这些描述符预测 13C 移位的密集网络的平均绝对误差 (MAE) 为 1.68 ppm。当这些特征被用作简单图神经网络(GNN)中的节点标签时,模型的平均绝对误差(MAE)达到了 1.34 ppm。另一方面,来自自监督预训练三维感知变换器的嵌入对于前馈模型来说描述性不足,但在 GNN 框架内却显示出合理的准确性,MAE 为 1.51 ppm。在低数据条件下,与现有的文献模型相比,所有迁移学习模型的预测准确性都有显著提高,无论采用何种采样策略从未标明的示例池中进行选择。我们证明,从大型多样化数据集上训练的模型中提取原子特征是预测核磁共振化学位移的有效迁移学习策略,其结果与现有文献模型相当。这种方法有几个优点,如缩短了训练时间,模型更简单,可训练参数更少,在低数据情况下性能更强,而不需要目标性质的昂贵的原初数据。这项技术可应用于其他化学任务,为数据量成为限制因素的领域开辟了许多新的潜在应用。
{"title":"Transfer learning based on atomic feature extraction for the prediction of experimental 13C chemical shifts†","authors":"Žarko Ivković, Jesús Jover and Jeremy Harvey","doi":"10.1039/D4DD00168K","DOIUrl":"https://doi.org/10.1039/D4DD00168K","url":null,"abstract":"<p >Forecasting experimental chemical shifts of organic compounds is a long-standing challenge in organic chemistry. Recent advances in machine learning (ML) have led to routines that surpass the accuracy of <em>ab initio</em> Density Functional Theory (DFT) in estimating experimental <small><sup>13</sup></small>C shifts. The extraction of knowledge from other models, known as transfer learning, has demonstrated remarkable improvements, particularly in scenarios with limited data availability. However, the extent to which transfer learning improves predictive accuracy in low-data regimes for experimental chemical shift predictions remains unexplored. This study indicates that atomic features derived from a message passing neural network (MPNN) forcefield are robust descriptors for atomic properties. A dense network utilizing these descriptors to predict <small><sup>13</sup></small>C shifts achieves a mean absolute error (MAE) of 1.68 ppm. When these features are used as node labels in a simple graph neural network (GNN), the model attains a better MAE of 1.34 ppm. On the other hand, embeddings from a self-supervised pre-trained 3D aware transformer are not sufficiently descriptive for a feedforward model but show reasonable accuracy within the GNN framework, achieving an MAE of 1.51 ppm. Under low-data conditions, all transfer-learned models show a significant improvement in predictive accuracy compared to existing literature models, regardless of the sampling strategy used to select from the pool of unlabeled examples. We demonstrated that extracting atomic features from models trained on large and diverse datasets is an effective transfer learning strategy for predicting NMR chemical shifts, achieving results on par with existing literature models. This method provides several benefits, such as reduced training times, simpler models with fewer trainable parameters, and strong performance in low-data scenarios, without the need for costly <em>ab initio</em> data of the target property. This technique can be applied to other chemical tasks opening many new potential applications where the amount of data is a limiting factor.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 2242-2251"},"PeriodicalIF":6.2,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00168k?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MADAS: A Python framework for assessing similarity in materials-science data MADAS:评估材料科学数据相似性的 Python 框架
Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-09-19 DOI: 10.1039/d4dd00258j
Martin Kuban, Santiago Rigamonti, Claudia Draxl
Computational materials science produces large quantities of data, both in terms of high-throughput calculations and individual studies. Extracting knowledge from this large and heterogeneous pool of data is challenging due to the wide variety of computational methods and approximations, resulting in significant veracity in the sheer amount of available data. One way of dealing with the problem is using similarity measures to group data, but also to understand where possible differences may come from. Here, we present MADAS, a Python framework for computing similarity relations between material properties. It can be used to automate the download of data from various sources, compute descriptors and similarities between materials, analyze the relationship between materials through their properties, and can incorporate a variety of existing machine learning methods. We explain the architecture of the package and demonstrate its power with representative examples.
计算材料科学产生了大量数据,包括高通量计算和单项研究。由于计算方法和近似方法种类繁多,导致大量可用数据的真实性非常高,因此从这一庞大的异构数据池中提取知识具有挑战性。解决这一问题的方法之一是使用相似性度量对数据进行分组,同时了解可能存在的差异。在此,我们介绍 MADAS,一个用于计算材料属性之间相似性关系的 Python 框架。它可用于自动下载各种来源的数据,计算材料之间的描述符和相似性,通过材料属性分析材料之间的关系,并可结合各种现有的机器学习方法。我们将解释软件包的架构,并通过具有代表性的示例展示其强大功能。
{"title":"MADAS: A Python framework for assessing similarity in materials-science data","authors":"Martin Kuban, Santiago Rigamonti, Claudia Draxl","doi":"10.1039/d4dd00258j","DOIUrl":"https://doi.org/10.1039/d4dd00258j","url":null,"abstract":"Computational materials science produces large quantities of data, both in terms of high-throughput calculations and individual studies. Extracting knowledge from this large and heterogeneous pool of data is challenging due to the wide variety of computational methods and approximations, resulting in significant veracity in the sheer amount of available data. One way of dealing with the problem is using similarity measures to group data, but also to understand where possible differences may come from. Here, we present MADAS, a Python framework for computing similarity relations between material properties. It can be used to automate the download of data from various sources, compute descriptors and similarities between materials, analyze the relationship between materials through their properties, and can incorporate a variety of existing machine learning methods. We explain the architecture of the package and demonstrate its power with representative examples.","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":"51 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142248408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring inhomogeneous surfaces: Ti-rich SrTiO3(110) reconstructions via active learning† 探索非均质表面:通过主动学习重构富钛 SrTiO3(110)
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-09-16 DOI: 10.1039/D4DD00231H
Ralf Wanzenböck, Esther Heid, Michele Riva, Giada Franceschi, Alexander M. Imre, Jesús Carrete, Ulrike Diebold and Georg K. H. Madsen

The investigation of inhomogeneous surfaces, where various local structures coexist, is crucial for understanding interfaces of technological interest, yet it presents significant challenges. Here, we study the atomic configurations of the (2 × m) Ti-rich surfaces at (110)-oriented SrTiO3 by bringing together scanning tunneling microscopy and transferable neural-network force fields combined with evolutionary exploration. We leverage an active learning methodology to iteratively extend the training data as needed for different configurations. Training on only small well-known reconstructions, we are able to extrapolate to the complicated and diverse overlayers encountered in different regions of the inhomogeneous SrTiO3(110)-(2 × m) surface. Our machine-learning-backed approach generates several new candidate structures, in good agreement with experiment and verified using density functional theory. The approach could be extended to other complex metal oxides featuring large coexisting surface reconstructions.

对各种局部结构共存的非均质表面进行研究,对于了解具有技术意义的界面至关重要,但这也带来了巨大的挑战。在这里,我们通过将扫描隧道显微镜和可转移神经网络力场与进化探索相结合,研究了 (110) 取向 SrTiO3 的 (2 × m) 富钛表面的原子构型。我们利用主动学习方法,根据不同配置的需要迭代扩展训练数据。我们仅在众所周知的小型重构上进行训练,就能推断出在不均匀的 SrTiO3(110)-(2 × m) 表面的不同区域所遇到的复杂多样的覆盖层。我们的机器学习方法生成了几种新的候选结构,与实验结果吻合,并通过密度泛函理论进行了验证。该方法可扩展到其他具有大量共存表面重构特征的复杂金属氧化物。
{"title":"Exploring inhomogeneous surfaces: Ti-rich SrTiO3(110) reconstructions via active learning†","authors":"Ralf Wanzenböck, Esther Heid, Michele Riva, Giada Franceschi, Alexander M. Imre, Jesús Carrete, Ulrike Diebold and Georg K. H. Madsen","doi":"10.1039/D4DD00231H","DOIUrl":"10.1039/D4DD00231H","url":null,"abstract":"<p >The investigation of inhomogeneous surfaces, where various local structures coexist, is crucial for understanding interfaces of technological interest, yet it presents significant challenges. Here, we study the atomic configurations of the (2 × <em>m</em>) Ti-rich surfaces at (110)-oriented SrTiO<small><sub>3</sub></small> by bringing together scanning tunneling microscopy and transferable neural-network force fields combined with evolutionary exploration. We leverage an active learning methodology to iteratively extend the training data as needed for different configurations. Training on only small well-known reconstructions, we are able to extrapolate to the complicated and diverse overlayers encountered in different regions of the inhomogeneous SrTiO<small><sub>3</sub></small>(110)-(2 × <em>m</em>) surface. Our machine-learning-backed approach generates several new candidate structures, in good agreement with experiment and verified using density functional theory. The approach could be extended to other complex metal oxides featuring large coexisting surface reconstructions.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 10","pages":" 2137-2145"},"PeriodicalIF":6.2,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00231h?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142248409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Force-controlled robotic mechanochemical synthesis† 力控机器人机械化学合成
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-09-16 DOI: 10.1039/D4DD00189C
Yusaku Nakajima, Kai Kawasaki, Yasuo Takeichi, Masashi Hamaya, Yoshitaka Ushiku and Kanta Ono

We demonstrate a novel mechanochemical synthesis method using a robotic powder grinding system that applies a precisely controlled and constant mechanical force. This approach significantly enhances reproducibility and enables detailed analysis of reaction pathways. Our results indicate that robotic force control can alter the reaction rate and influence the reaction pathway, highlighting its potential for elucidating chemical reaction mechanisms and fostering the discovery of new chemical reactions. Despite its significance, the application of a controllable constant force in macroscale mechanochemical synthesis remains challenging. To address this gap, we compared the reproducibilities of various mechanochemical syntheses using conventional manual grinding, ball milling, and our novel robotic approach with perovskite materials. Our findings indicate that the robotic approach provides significantly higher reproducibility than conventional methods, facilitating the analysis of reaction pathways. By manipulating the grinding force and speed, we revealed that robotic force control can alter both the reaction rate and pathway. Consequently, robotic mechanochemical synthesis has significant potential for advancing the understanding of chemical reaction mechanisms and discovering new reactions.

我们成功地展示了利用机器人粉末研磨系统进行机械化学合成的方法,该系统能够施加精确控制的恒定机械力。尽管具有重要意义,但在大规模机械化学合成中应用可控恒定力仍具有挑战性。为了弥补这一不足,我们比较了传统手工研磨、球磨和我们使用包光体材料的新型机器人合成方法在各种机械化学合成中的再现性。我们的研究结果表明,与传统方法相比,机器人方法的重现性要高得多。这种可重复性的提高为分析反应路径提供了可能。我们研究了通过控制研磨力和速度对反应路径的影响。结果表明,机器人力控制可改变反应速率并影响反应路径。因此,机器人机械化学合成在阐明化学反应机理和促进新化学反应的发现方面具有潜力。
{"title":"Force-controlled robotic mechanochemical synthesis†","authors":"Yusaku Nakajima, Kai Kawasaki, Yasuo Takeichi, Masashi Hamaya, Yoshitaka Ushiku and Kanta Ono","doi":"10.1039/D4DD00189C","DOIUrl":"10.1039/D4DD00189C","url":null,"abstract":"<p >We demonstrate a novel mechanochemical synthesis method using a robotic powder grinding system that applies a precisely controlled and constant mechanical force. This approach significantly enhances reproducibility and enables detailed analysis of reaction pathways. Our results indicate that robotic force control can alter the reaction rate and influence the reaction pathway, highlighting its potential for elucidating chemical reaction mechanisms and fostering the discovery of new chemical reactions. Despite its significance, the application of a controllable constant force in macroscale mechanochemical synthesis remains challenging. To address this gap, we compared the reproducibilities of various mechanochemical syntheses using conventional manual grinding, ball milling, and our novel robotic approach with perovskite materials. Our findings indicate that the robotic approach provides significantly higher reproducibility than conventional methods, facilitating the analysis of reaction pathways. By manipulating the grinding force and speed, we revealed that robotic force control can alter both the reaction rate and pathway. Consequently, robotic mechanochemical synthesis has significant potential for advancing the understanding of chemical reaction mechanisms and discovering new reactions.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 10","pages":" 2130-2136"},"PeriodicalIF":6.2,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00189c?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142248410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Knowledge graph representation of zeolitic crystalline materials† 沸石晶体材料的知识图谱表示法
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-09-13 DOI: 10.1039/D4DD00166D
Aleksandar Kondinski, Pavlo Rutkevych, Laura Pascazio, Dan N. Tran, Feroz Farazi, Srishti Ganguly and Markus Kraft

Zeolites are complex and porous crystalline inorganic materials that serve as hosts for a variety of molecular, ionic and cluster species. Formal, machine-actionable representation of this chemistry presents a challenge as a variety of concepts need to be semantically interlinked. This work demonstrates the potential of knowledge engineering in overcoming this challenge. We develop ontologies OntoCrystal and OntoZeolite, enabling the representation and instantiation of crystalline zeolite information into a dynamic, interoperable knowledge graph called The World Avatar (TWA). In TWA, crystalline zeolite instances are semantically interconnected with chemical species that act as guests in these materials. Information can be obtained via custom or templated SPARQL queries administered through a user-friendly web interface. Unstructured exploration is facilitated through natural language processing using the Marie System, showcasing promise for the blended large language model – knowledge graph approach in providing accurate responses on zeolite chemistry in natural language.

沸石是一种复杂多孔的结晶无机材料,可作为各种分子、离子和团簇物种的宿主。由于各种概念需要在语义上相互关联,因此对这种化学性质进行正式的、机器可操作的表述是一项挑战。这项工作展示了知识工程在克服这一挑战方面的潜力。我们开发了本体论 OntoCrystal 和 OntoZeolite,使结晶沸石信息的表示和实例化成为一个动态、可互操作的知识图谱,称为 "世界阿凡达"(TWA)。在 TWA 中,结晶沸石实例与作为这些材料客体的化学物种在语义上相互关联。可通过用户友好的网络界面管理自定义或模板 SPARQL 查询来获取信息。通过使用玛丽系统进行自然语言处理,可以方便地进行非结构化探索,从而展示了混合大型语言模型-知识图谱方法在用自然语言提供沸石化学准确回复方面的前景。
{"title":"Knowledge graph representation of zeolitic crystalline materials†","authors":"Aleksandar Kondinski, Pavlo Rutkevych, Laura Pascazio, Dan N. Tran, Feroz Farazi, Srishti Ganguly and Markus Kraft","doi":"10.1039/D4DD00166D","DOIUrl":"10.1039/D4DD00166D","url":null,"abstract":"<p >Zeolites are complex and porous crystalline inorganic materials that serve as hosts for a variety of molecular, ionic and cluster species. Formal, machine-actionable representation of this chemistry presents a challenge as a variety of concepts need to be semantically interlinked. This work demonstrates the potential of knowledge engineering in overcoming this challenge. We develop ontologies OntoCrystal and OntoZeolite, enabling the representation and instantiation of crystalline zeolite information into a dynamic, interoperable knowledge graph called The World Avatar (TWA). In TWA, crystalline zeolite instances are semantically interconnected with chemical species that act as guests in these materials. Information can be obtained <em>via</em> custom or templated SPARQL queries administered through a user-friendly web interface. Unstructured exploration is facilitated through natural language processing using the Marie System, showcasing promise for the blended large language model – knowledge graph approach in providing accurate responses on zeolite chemistry in natural language.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 10","pages":" 2070-2084"},"PeriodicalIF":6.2,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00166d?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Auto-VTNA: an automatic VTNA platform for determination of global rate laws†‡ 自动 VTNA:确定全球速率定律的自动 VTNA 平台
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-09-13 DOI: 10.1039/D4DD00111G
Daniel Dalland, Linden Schrecker and King Kuok (Mimi) Hii

The ability and desire to collect kinetic data has greatly increased in recent years, requiring more automated and quantitative methods for analysis. In this work, an automated program (Auto-VTNA) is developed, to simplify the kinetic analysis workflow. Auto-VTNA allows all the reaction orders to be determined concurrently, expediting the process of kinetic analysis. Auto-VTNA performs well on noisy or sparse data sets and can handle complex reactions involving multiple reaction orders. Quantitative error analysis and facile visualisation allows users to numerically justify and robustly present their findings. Auto-VTNA can be used through a free graphical user interface (GUI), requiring no coding or expert kinetic model input from the user, and can be customised and built on if required.

近年来,收集动力学数据的能力和愿望大大提高,需要更多的自动化定量分析方法。本研究开发了一个自动程序(Auto-VTNA),以简化动力学分析工作流程。自动 VTNA 允许同时确定所有反应顺序,加快了动力学分析过程。自动 VTNA 在噪声或稀疏数据集上表现良好,并能处理涉及多个反应阶的复杂反应。定量误差分析和简便的可视化功能使用户可以用数字证明并有力地展示他们的研究结果。Auto-VTNA 可通过免费的图形用户界面 (GUI) 使用,用户无需编码或输入专家级动力学模型,并可根据需要进行定制和构建。
{"title":"Auto-VTNA: an automatic VTNA platform for determination of global rate laws†‡","authors":"Daniel Dalland, Linden Schrecker and King Kuok (Mimi) Hii","doi":"10.1039/D4DD00111G","DOIUrl":"10.1039/D4DD00111G","url":null,"abstract":"<p >The ability and desire to collect kinetic data has greatly increased in recent years, requiring more automated and quantitative methods for analysis. In this work, an automated program (Auto-VTNA) is developed, to simplify the kinetic analysis workflow. Auto-VTNA allows all the reaction orders to be determined concurrently, expediting the process of kinetic analysis. Auto-VTNA performs well on noisy or sparse data sets and can handle complex reactions involving multiple reaction orders. Quantitative error analysis and facile visualisation allows users to numerically justify and robustly present their findings. Auto-VTNA can be used through a free graphical user interface (GUI), requiring no coding or expert kinetic model input from the user, and can be customised and built on if required.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 10","pages":" 2118-2129"},"PeriodicalIF":6.2,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00111g?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142248411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Physics-based reward driven image analysis in microscopy 基于物理奖励的显微图像分析
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-09-12 DOI: 10.1039/D4DD00132J
K. Barakati, Hui Yuan, Amit Goyal and S. V. Kalinin

The rise of electron microscopy has expanded our ability to acquire nanometer and atomically resolved images of complex materials. The resulting vast datasets are typically analyzed by human operators, an intrinsically challenging process due to the multiple possible analysis steps and the corresponding need to build and optimize complex analysis workflows. We present a methodology based on the concept of a Reward Function coupled with Bayesian Optimization, to optimize image analysis workflows dynamically. The Reward Function is engineered to closely align with the experimental objectives and broader context and is quantifiable upon completion of the analysis. Here, cross-section, high-angle annular dark field (HAADF) images of ion-irradiated (Y, Dy)Ba2Cu3O7−δ thin-films were used as a model system. The reward functions were formed based on the expected materials density and atomic spacings and used to drive multi-objective optimization of the classical Laplacian-of-Gaussian (LoG) method. These results can be benchmarked against the DCNN segmentation. This optimized LoG* compares favorably against DCNN in the presence of the additional noise. We further extend the reward function approach towards the identification of partially-disordered regions, creating a physics-driven reward function and action space of high-dimensional clustering. We pose that with correct definition, the reward function approach allows real-time optimization of complex analysis workflows at much higher speeds and lower computational costs than classical DCNN-based inference, ensuring the attainment of results that are both precise and aligned with the human-defined objectives.

电子显微镜的兴起扩大了我们获取复杂材料的纳米和原子分辨率图像的能力。由此产生的大量数据集通常由人类操作员进行分析,由于可能存在多个分析步骤以及相应地需要建立和优化复杂的分析工作流程,这是一个具有内在挑战性的过程。我们提出了一种基于奖励函数与贝叶斯优化概念的方法,用于动态优化图像分析工作流程。奖励函数的设计与实验目标和更广泛的背景密切相关,并可在分析完成后量化。在这里,离子辐照 (Y, Dy)Ba2Cu3O7-δ 薄膜的横截面高角度环形暗场 (HAADF) 图像被用作模型系统。奖励函数是根据预期的材料密度和原子间距形成的,并用于驱动经典高斯拉普拉斯(LoG)方法的多目标优化。这些结果可以与 DCNN 细分法进行比较。在存在额外噪声的情况下,优化后的 LoG* 与 DCNN 相比更胜一筹。我们进一步扩展了奖励函数方法,使其适用于识别部分失序区域,创建了一个物理驱动的奖励函数和高维聚类的行动空间。我们提出,与基于 DCNN 的经典推理相比,只要定义正确,奖励函数方法就能以更高的速度和更低的计算成本对复杂的分析工作流程进行实时优化,确保获得既精确又符合人类定义目标的结果。
{"title":"Physics-based reward driven image analysis in microscopy","authors":"K. Barakati, Hui Yuan, Amit Goyal and S. V. Kalinin","doi":"10.1039/D4DD00132J","DOIUrl":"10.1039/D4DD00132J","url":null,"abstract":"<p >The rise of electron microscopy has expanded our ability to acquire nanometer and atomically resolved images of complex materials. The resulting vast datasets are typically analyzed by human operators, an intrinsically challenging process due to the multiple possible analysis steps and the corresponding need to build and optimize complex analysis workflows. We present a methodology based on the concept of a Reward Function coupled with Bayesian Optimization, to optimize image analysis workflows dynamically. The Reward Function is engineered to closely align with the experimental objectives and broader context and is quantifiable upon completion of the analysis. Here, cross-section, high-angle annular dark field (HAADF) images of ion-irradiated (Y, Dy)Ba<small><sub>2</sub></small>Cu<small><sub>3</sub></small>O<small><sub>7−<em>δ</em></sub></small> thin-films were used as a model system. The reward functions were formed based on the expected materials density and atomic spacings and used to drive multi-objective optimization of the classical Laplacian-of-Gaussian (LoG) method. These results can be benchmarked against the DCNN segmentation. This optimized LoG* compares favorably against DCNN in the presence of the additional noise. We further extend the reward function approach towards the identification of partially-disordered regions, creating a physics-driven reward function and action space of high-dimensional clustering. We pose that with correct definition, the reward function approach allows real-time optimization of complex analysis workflows at much higher speeds and lower computational costs than classical DCNN-based inference, ensuring the attainment of results that are both precise and aligned with the human-defined objectives.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 10","pages":" 2061-2069"},"PeriodicalIF":6.2,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00132j?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142226317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A machine learning approach for the prediction of aqueous solubility of pharmaceuticals: a comparative model and dataset analysis† 预测药物水溶性的机器学习方法:模型和数据集比较分析
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-09-09 DOI: 10.1039/D4DD00065J
Mohammad Amin Ghanavati, Soroush Ahmadi and Sohrab Rohani

The effectiveness of drug treatments depends significantly on the water solubility of compounds, influencing bioavailability and therapeutic outcomes. A reliable predictive solubility tool enables drug developers to swiftly identify drugs with low solubility and implement proactive solubility enhancement techniques. The current research proposes three predictive models based on four solubility datasets (ESOL, AQUA, PHYS, OCHEM), encompassing 3942 unique molecules. Three different molecular representations were obtained, including electrostatic potential (ESP) maps, molecular graph, and tabular features (extracted from ESP maps and tabular Mordred descriptors). We conducted 3942 DFT calculations to acquire ESP maps and extract features from them. Subsequently, we applied two deep learning models, EdgeConv and Graph Convolutional Network (GCN), to the point cloud (ESP) and graph modalities of molecules. In addition, we utilized a random forest-based feature selection on tabular features, followed by mapping with XGBoost. A t-SNE analysis visualized chemical space across datasets and unique molecules, providing valuable insights for model evaluation. The proposed machine learning (ML)-based models, trained on 80% of each dataset and evaluated on the remaining 20%, showcased superior performance, particularly with XGBoost utilizing the extracted and selected tabular features. This yielded average test data Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and R-squared (R2) values of 0.458, 0.613, and 0.918, respectively. Furthermore, an ensemble of the three models showed improvement in error metrics across all datasets, consistently outperforming each individual model. This Ensemble model was also tested on the Solubility Challenge 2019, achieving an RMSE of 0.865 and outperforming 37 models with an average RMSE of 1.62. Transferability analysis of our work further indicated robust performance across different datasets. Additionally, SHAP explainability for the feature-based XGBoost model provided transparency in solubility predictions, enhancing the interpretability of the results.

药物治疗的效果在很大程度上取决于化合物的水溶性,它影响着生物利用率和治疗效果。可靠的溶解度预测工具可帮助药物开发人员迅速识别溶解度低的药物,并实施积极的溶解度增强技术。目前的研究基于四个溶解度数据集(ESOL、AQUA、PHYS、OCHEM)提出了三种预测模型,涵盖 3942 种独特的分子。我们获得了三种不同的分子表征,包括静电位(ESP)图、分子图和表列特征(从ESP图和表列Mordred描述符中提取)。我们进行了 3942 次 DFT 计算,以获取 ESP 图并从中提取特征。随后,我们将 EdgeConv 和 Graph Convolutional Network (GCN) 这两种深度学习模型应用于分子的点云(ESP)和图模式。此外,我们还在表格特征上使用了基于随机森林的特征选择,然后使用 XGBoost 进行映射。t-SNE 分析可视化跨数据集和独特分子的化学空间,为模型评估提供了宝贵的见解。所提出的基于机器学习(ML)的模型在每个数据集的 80% 数据上进行了训练,并在剩余的 20% 数据上进行了评估,显示出卓越的性能,尤其是在利用提取和选择的表格特征进行 XGBoost 时。测试数据的平均绝对误差 (MAE)、均方根误差 (RMSE) 和 R 平方 (R2) 值分别为 0.458、0.613 和 0.918。
{"title":"A machine learning approach for the prediction of aqueous solubility of pharmaceuticals: a comparative model and dataset analysis†","authors":"Mohammad Amin Ghanavati, Soroush Ahmadi and Sohrab Rohani","doi":"10.1039/D4DD00065J","DOIUrl":"10.1039/D4DD00065J","url":null,"abstract":"<p >The effectiveness of drug treatments depends significantly on the water solubility of compounds, influencing bioavailability and therapeutic outcomes. A reliable predictive solubility tool enables drug developers to swiftly identify drugs with low solubility and implement proactive solubility enhancement techniques. The current research proposes three predictive models based on four solubility datasets (ESOL, AQUA, PHYS, OCHEM), encompassing 3942 unique molecules. Three different molecular representations were obtained, including electrostatic potential (ESP) maps, molecular graph, and tabular features (extracted from ESP maps and tabular Mordred descriptors). We conducted 3942 DFT calculations to acquire ESP maps and extract features from them. Subsequently, we applied two deep learning models, EdgeConv and Graph Convolutional Network (GCN), to the point cloud (ESP) and graph modalities of molecules. In addition, we utilized a random forest-based feature selection on tabular features, followed by mapping with XGBoost. A t-SNE analysis visualized chemical space across datasets and unique molecules, providing valuable insights for model evaluation. The proposed machine learning (ML)-based models, trained on 80% of each dataset and evaluated on the remaining 20%, showcased superior performance, particularly with XGBoost utilizing the extracted and selected tabular features. This yielded average test data Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and <em>R</em>-squared (<em>R</em><small><sup>2</sup></small>) values of 0.458, 0.613, and 0.918, respectively. Furthermore, an ensemble of the three models showed improvement in error metrics across all datasets, consistently outperforming each individual model. This Ensemble model was also tested on the Solubility Challenge 2019, achieving an RMSE of 0.865 and outperforming 37 models with an average RMSE of 1.62. Transferability analysis of our work further indicated robust performance across different datasets. Additionally, SHAP explainability for the feature-based XGBoost model provided transparency in solubility predictions, enhancing the interpretability of the results.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 10","pages":" 2085-2104"},"PeriodicalIF":6.2,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00065j?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Outstanding Reviewers for Digital Discovery in 2023 2023 年数字发现杰出评审员
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-09-06 DOI: 10.1039/D4DD90037E

We would like to take this opportunity to thank all of Digital Discovery’s reviewers for helping to preserve quality and integrity in chemical science literature. We would also like to highlight the Outstanding Reviewers for Digital Discovery in 2023.

我们想借此机会感谢数字发现的所有审稿人,感谢他们帮助维护化学科学文献的质量和完整性。同时,我们还想特别介绍一下2023年数字发现的杰出审稿人。
{"title":"Outstanding Reviewers for Digital Discovery in 2023","authors":"","doi":"10.1039/D4DD90037E","DOIUrl":"10.1039/D4DD90037E","url":null,"abstract":"<p >We would like to take this opportunity to thank all of <em>Digital Discovery</em>’s reviewers for helping to preserve quality and integrity in chemical science literature. We would also like to highlight the Outstanding Reviewers for <em>Digital Discovery</em> in 2023.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 10","pages":" 1922-1922"},"PeriodicalIF":6.2,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd90037e?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated processing of chromatograms: a comprehensive python package with a GUI for intelligent peak identification and deconvolution in chemical reaction analysis 色谱自动处理:用于化学反应分析中智能峰值识别和解卷积的带图形用户界面的 Python 综合软件包
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-09-05 DOI: 10.1039/D4DD00214H
Jan Obořil, Christian P. Haas, Maximilian Lübbesmeyer, Rachel Nicholls, Thorsten Gressling, Klavs F. Jensen, Giulio Volpin and Julius Hillenbrand

Reaction screening and high-throughput experimentation (HTE) coupled with liquid chromatography (HPLC and UHPLC) are becoming more important than ever in synthetic chemistry. With a growing number of experiments, it is increasingly difficult to ensure correct peak identification and integration, especially due to unknown side components which often overlap with the peaks of interest. We developed an improved version of the MOCCA Python package with a web-based graphical user interface (GUI) for automated processing of chromatograms, including baseline correction, intelligent peak picking, peak purity checks, deconvolution of overlapping peaks, and compound tracking. The individual automatic processing steps have been improved compared to the previous version of MOCCA to make the software more dependable and versatile. The algorithm accuracy was benchmarked using three datasets and compared to the previous MOCCA implementation and published results. The processing is fully automated with the possibility to include calibration and internal standards. The software supports chromatograms with photo-diode array detector (DAD) data from most commercial HPLC systems, and the Python package and GUI implementation are open-source to allow addition of new features and further development.

反应筛选和高通量实验 (HTE) 与液相色谱法(高效液相色谱法和超高效液相色谱法)相结合,在合成化学中变得比以往任何时候都更加重要。随着实验数量的不断增加,确保正确的峰识别和整合变得越来越困难,特别是由于未知的副成分经常与感兴趣的峰重叠。我们开发了 MOCCA Python 软件包的改进版,该软件包具有基于网络的图形用户界面 (GUI),用于自动处理色谱图,包括基线校正、智能选峰、峰纯度检查、重叠峰解卷积和化合物跟踪。与 MOCCA 的前一版本相比,各个自动处理步骤都有所改进,使软件更加可靠和通用。使用三个数据集对算法的准确性进行了基准测试,并将其与之前的 MOCCA 实施方案和已公布的结果进行了比较。处理过程完全自动化,可加入校准和内标。该软件支持来自大多数商用高效液相色谱系统的带有光电二极管阵列检测器 (DAD) 数据的色谱图,Python 软件包和图形用户界面实现是开源的,允许添加新功能和进一步开发。
{"title":"Automated processing of chromatograms: a comprehensive python package with a GUI for intelligent peak identification and deconvolution in chemical reaction analysis","authors":"Jan Obořil, Christian P. Haas, Maximilian Lübbesmeyer, Rachel Nicholls, Thorsten Gressling, Klavs F. Jensen, Giulio Volpin and Julius Hillenbrand","doi":"10.1039/D4DD00214H","DOIUrl":"10.1039/D4DD00214H","url":null,"abstract":"<p >Reaction screening and high-throughput experimentation (HTE) coupled with liquid chromatography (HPLC and UHPLC) are becoming more important than ever in synthetic chemistry. With a growing number of experiments, it is increasingly difficult to ensure correct peak identification and integration, especially due to unknown side components which often overlap with the peaks of interest. We developed an improved version of the MOCCA Python package with a web-based graphical user interface (GUI) for automated processing of chromatograms, including baseline correction, intelligent peak picking, peak purity checks, deconvolution of overlapping peaks, and compound tracking. The individual automatic processing steps have been improved compared to the previous version of MOCCA to make the software more dependable and versatile. The algorithm accuracy was benchmarked using three datasets and compared to the previous MOCCA implementation and published results. The processing is fully automated with the possibility to include calibration and internal standards. The software supports chromatograms with photo-diode array detector (DAD) data from most commercial HPLC systems, and the Python package and GUI implementation are open-source to allow addition of new features and further development.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 10","pages":" 2041-2051"},"PeriodicalIF":6.2,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00214h?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Digital discovery
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1