首页 > 最新文献

Digital discovery最新文献

英文 中文
Biophysics-guided uncertainty-aware deep learning uncovers high-affinity plastic-binding peptides
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2025-01-24 DOI: 10.1039/D4DD00219A
Abdulelah S. Alshehri, Michael T. Bergman, Fengqi You and Carol K. Hall

Plastic pollution, particularly microplastics (MPs), poses a significant global threat to ecosystems and human health, necessitating innovative remediation strategies. Biocompatible and biodegradable plastic-binding peptides (PBPs) offer a potential solution through targeted adsorption and subsequent MP detection or removal from the environment. A challenge in discovering plastic-binding peptides is the vast combinatorial space of possible peptides (i.e., over 1015 for 12-mer peptides), which far exceeds the sample sizes typically reachable by experiments or biophysics-based computational methods. One step towards addressing this issue is to train deep learning models on experimental or biophysical datasets, permitting faster and cheaper evaluations of peptides. However, deep learning predictions are not always accurate, which could waste time and money due to synthesizing and evaluating false positives. Here, we resolve this issue by combining biophysical modeling data from Peptide Binder Design (PepBD) algorithm, the predictive power and uncertainty quantification of evidential deep learning, and metaheuristic search methods to identify high-affinity PBPs for several common plastics. Molecular dynamics simulations show that the discovered PBPs have greater median adsorption free energies for polyethylene (5%), polypropylene (18%), and polystyrene (34%) relative to PBPs previously designed by PepBD. The impact of including uncertainty quantification in peptide design is demonstrated by the increasing improvement in the median adsorption free energy with decreasing uncertainty. This robust framework accelerates peptide discovery, paving the way for effective, bio-inspired solutions to MP remediation.

塑料污染,尤其是微塑料(MPs),对生态系统和人类健康构成了严重的全球性威胁,因此必须采取创新的补救策略。生物相容性和可生物降解的塑料结合肽(PBPs)提供了一种潜在的解决方案,即通过靶向吸附,随后检测或清除环境中的MP。发现塑料结合肽的一个挑战是可能的肽的组合空间巨大(例如,12-mer 肽的组合空间超过 1015 个),这远远超出了实验或基于生物物理学的计算方法通常可以达到的样本量。解决这一问题的一个方法是在实验或生物物理数据集上训练深度学习模型,从而可以更快、更便宜地评估多肽。然而,深度学习的预测并不总是准确的,这可能会因为合成和评估假阳性而浪费时间和金钱。在这里,我们通过结合多肽粘合剂设计(PepBD)算法的生物物理建模数据、证据深度学习的预测能力和不确定性量化,以及元启发式搜索方法来识别几种常见塑料的高亲和性 PBPs,从而解决了这个问题。分子动力学模拟结果表明,与 PepBD 以前设计的 PBPs 相比,发现的 PBPs 对聚乙烯(5%)、聚丙烯(18%)和聚苯乙烯(34%)的吸附自由能中值更大。随着不确定性的降低,中位吸附自由能也在不断提高,这证明了在多肽设计中加入不确定性量化的影响。这种稳健的框架加快了多肽的发现,为生物启发的有效MP修复方案铺平了道路。
{"title":"Biophysics-guided uncertainty-aware deep learning uncovers high-affinity plastic-binding peptides","authors":"Abdulelah S. Alshehri, Michael T. Bergman, Fengqi You and Carol K. Hall","doi":"10.1039/D4DD00219A","DOIUrl":"10.1039/D4DD00219A","url":null,"abstract":"<p >Plastic pollution, particularly microplastics (MPs), poses a significant global threat to ecosystems and human health, necessitating innovative remediation strategies. Biocompatible and biodegradable plastic-binding peptides (PBPs) offer a potential solution through targeted adsorption and subsequent MP detection or removal from the environment. A challenge in discovering plastic-binding peptides is the vast combinatorial space of possible peptides (<em>i.e.</em>, over 10<small><sup>15</sup></small> for 12-mer peptides), which far exceeds the sample sizes typically reachable by experiments or biophysics-based computational methods. One step towards addressing this issue is to train deep learning models on experimental or biophysical datasets, permitting faster and cheaper evaluations of peptides. However, deep learning predictions are not always accurate, which could waste time and money due to synthesizing and evaluating false positives. Here, we resolve this issue by combining biophysical modeling data from Peptide Binder Design (PepBD) algorithm, the predictive power and uncertainty quantification of evidential deep learning, and metaheuristic search methods to identify high-affinity PBPs for several common plastics. Molecular dynamics simulations show that the discovered PBPs have greater median adsorption free energies for polyethylene (5%), polypropylene (18%), and polystyrene (34%) relative to PBPs previously designed by PepBD. The impact of including uncertainty quantification in peptide design is demonstrated by the increasing improvement in the median adsorption free energy with decreasing uncertainty. This robust framework accelerates peptide discovery, paving the way for effective, bio-inspired solutions to MP remediation.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 2","pages":" 561-571"},"PeriodicalIF":6.2,"publicationDate":"2025-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11771220/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143070057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Artificial intelligence-assisted electrochemical sensors for qualitative and semi-quantitative multiplexed analyses†
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2025-01-20 DOI: 10.1039/D4DD00318G
Rocco Cancelliere, Mario Molinara, Antonio Licheri, Antonio Maffucci and Laura Micheli

This research utilises Artificial Intelligence (AI) to enhance electrochemical peak resolution and lower detection limits in voltammetric analysis, focusing on complex, multiplex real matrices analyses. The study investigated the quinone family, hydroquinone, benzoquinone, and catechol analysed individually and in mixtures using cyclic and square wave voltammetry. The ferrocyanide/ferricyanide redox couple was included as a standard redox probe to provide a reference for method validation.

{"title":"Artificial intelligence-assisted electrochemical sensors for qualitative and semi-quantitative multiplexed analyses†","authors":"Rocco Cancelliere, Mario Molinara, Antonio Licheri, Antonio Maffucci and Laura Micheli","doi":"10.1039/D4DD00318G","DOIUrl":"https://doi.org/10.1039/D4DD00318G","url":null,"abstract":"<p >This research utilises Artificial Intelligence (AI) to enhance electrochemical peak resolution and lower detection limits in voltammetric analysis, focusing on complex, multiplex real matrices analyses. The study investigated the quinone family, hydroquinone, benzoquinone, and catechol analysed individually and in mixtures using cyclic and square wave voltammetry. The ferrocyanide/ferricyanide redox couple was included as a standard redox probe to provide a reference for method validation.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 2","pages":" 338-342"},"PeriodicalIF":6.2,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00318g?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143396422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Commit: Mini article for dynamic reporting of incremental improvements to previous scholarly work
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2025-01-20 DOI: 10.1039/D4DD90053G
Alan Aspuru-Guzik, Jason E. Hein and Joshua Schrier

A graphical abstract is available for this content

{"title":"Commit: Mini article for dynamic reporting of incremental improvements to previous scholarly work","authors":"Alan Aspuru-Guzik, Jason E. Hein and Joshua Schrier","doi":"10.1039/D4DD90053G","DOIUrl":"https://doi.org/10.1039/D4DD90053G","url":null,"abstract":"<p >A graphical abstract is available for this content</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 2","pages":" 301-302"},"PeriodicalIF":6.2,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd90053g?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143396406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring the expertise of large language models in materials science and metallurgical engineering†
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2025-01-20 DOI: 10.1039/D4DD00319E
Christophe Bajan and Guillaume Lambard

The integration of artificial intelligence into various domains is rapidly increasing, with Large Language Models (LLMs) becoming more prevalent in numerous applications. This work is included in an overall project which aims to train an LLM specifically in the field of materials science. To assess the impact of this specialized training, it is essential to establish the baseline performance of existing LLMs in materials science. In this study, we evaluated 15 different LLMs using the MaScQA question answering (Q&A) benchmark. This benchmark comprises questions from the Graduate Aptitude Test in Engineering (GATE), tailored to test models' capabilities in answering questions related to materials science and metallurgical engineering. Our results indicate that closed-source LLMs, such as Claude-3.5-Sonnet and GPT-4o, perform the best with an overall accuracy of ∼84%, while open-source models, such as Llama3-70b and Phi3-14b, top at ∼56% and ∼43%, respectively. These findings provide a baseline for the raw capabilities of LLMs on Q&A tasks applied to materials science, and emphasise the substantial improvement that could be brought to open-source models via prompt engineering and fine-tuning strategies. We anticipate that this work could push the adoption of LLMs as valuable assistants in materials science, demonstrating their utilities in this specialised domain and related sub-domains.

{"title":"Exploring the expertise of large language models in materials science and metallurgical engineering†","authors":"Christophe Bajan and Guillaume Lambard","doi":"10.1039/D4DD00319E","DOIUrl":"https://doi.org/10.1039/D4DD00319E","url":null,"abstract":"<p >The integration of artificial intelligence into various domains is rapidly increasing, with Large Language Models (LLMs) becoming more prevalent in numerous applications. This work is included in an overall project which aims to train an LLM specifically in the field of materials science. To assess the impact of this specialized training, it is essential to establish the baseline performance of existing LLMs in materials science. In this study, we evaluated 15 different LLMs using the MaScQA question answering (Q&amp;A) benchmark. This benchmark comprises questions from the Graduate Aptitude Test in Engineering (GATE), tailored to test models' capabilities in answering questions related to materials science and metallurgical engineering. Our results indicate that closed-source LLMs, such as Claude-3.5-Sonnet and GPT-4o, perform the best with an overall accuracy of ∼84%, while open-source models, such as Llama3-70b and Phi3-14b, top at ∼56% and ∼43%, respectively. These findings provide a baseline for the raw capabilities of LLMs on Q&amp;A tasks applied to materials science, and emphasise the substantial improvement that could be brought to open-source models <em>via</em> prompt engineering and fine-tuning strategies. We anticipate that this work could push the adoption of LLMs as valuable assistants in materials science, demonstrating their utilities in this specialised domain and related sub-domains.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 2","pages":" 500-512"},"PeriodicalIF":6.2,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00319e?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143396440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Composition and structure analyzer/featurizer for explainable machine-learning models to predict solid state structures†
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2025-01-17 DOI: 10.1039/D4DD00332B
Emil I. Jaffal, Sangjoon Lee, Danila Shiryaev, Alex Vtorov, Nikhil Kumar Barua, Holger Kleinke and Anton O. Oliynyk

Traditional and non-classical machine learning models for solid-state structure prediction have predominantly relied on compositional features (derived from properties of constituent elements) to predict the existence of a structure and its properties. However, the lack of structural information can be a source of suboptimal property mapping and increased predictive uncertainty. To address this challenge, we have introduced a strategy that generates and combines both compositional and structural features with minimal programming expertise required. Our approach utilizes open-source, interactive Python programs named Composition Analyzer Featurizer (CAF) and Structure Analyzer Featurizer (SAF). CAF generates numerical compositional features from a list of formulae provided in an Excel file, while SAF extracts numerical structural features from a .cif file by generating a supercell. 133 features from CAF and 94 features from SAF are used either individually or in combination to cluster nine structure types in equiatomic AB intermetallics. The performance is comparable to those with features from JARVIS, MAGPIE, mat2vec, and OLED datasets in PLS-DA, SVM, and XGBoost models. Our SAF + CAF features provide a cost-efficient and reliable solution, even with the PLS-DA method, where a significant fraction of the most contributing features is the same as those identified in the more computationally intensive XGBoost models.

{"title":"Composition and structure analyzer/featurizer for explainable machine-learning models to predict solid state structures†","authors":"Emil I. Jaffal, Sangjoon Lee, Danila Shiryaev, Alex Vtorov, Nikhil Kumar Barua, Holger Kleinke and Anton O. Oliynyk","doi":"10.1039/D4DD00332B","DOIUrl":"https://doi.org/10.1039/D4DD00332B","url":null,"abstract":"<p >Traditional and non-classical machine learning models for solid-state structure prediction have predominantly relied on compositional features (derived from properties of constituent elements) to predict the existence of a structure and its properties. However, the lack of structural information can be a source of suboptimal property mapping and increased predictive uncertainty. To address this challenge, we have introduced a strategy that generates and combines both compositional and structural features with minimal programming expertise required. Our approach utilizes open-source, interactive Python programs named Composition Analyzer Featurizer (CAF) and Structure Analyzer Featurizer (SAF). CAF generates numerical compositional features from a list of formulae provided in an Excel file, while SAF extracts numerical structural features from a .cif file by generating a supercell. 133 features from CAF and 94 features from SAF are used either individually or in combination to cluster nine structure types in equiatomic AB intermetallics. The performance is comparable to those with features from JARVIS, MAGPIE, mat2vec, and OLED datasets in PLS-DA, SVM, and XGBoost models. Our SAF + CAF features provide a cost-efficient and reliable solution, even with the PLS-DA method, where a significant fraction of the most contributing features is the same as those identified in the more computationally intensive XGBoost models.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 2","pages":" 548-560"},"PeriodicalIF":6.2,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00332b?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143396443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Does one need to polish electrodes in an eight pattern? Automation provides the answer†
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2025-01-10 DOI: 10.1039/D4DD00323C
Naruki Yoshikawa, Gun Deniz Akkoc, Sergio Pablo-García, Yang Cao, Han Hao and Alán Aspuru-Guzik

Automation of electrochemical measurements can accelerate the discovery of new electroactive materials. One of the hurdles to automated electrochemical measurement is the pretreatment of electrodes because mechanical polishing is usually conducted manually. Here we investigate the automation of electrochemical measurements using a robotic arm. We demonstrate automated mechanical polishing using a station with a moving polishing pad and evaluate the effect of different polishing patterns. Our automatic method improved the corroded electrodes, and we found the effect of pattern was not significant, which diverges from the current common belief amongst practitioners that a figure eight pattern is best for pretreatment. This research is a step toward automating electrochemistry experiments without human intervention.

{"title":"Does one need to polish electrodes in an eight pattern? Automation provides the answer†","authors":"Naruki Yoshikawa, Gun Deniz Akkoc, Sergio Pablo-García, Yang Cao, Han Hao and Alán Aspuru-Guzik","doi":"10.1039/D4DD00323C","DOIUrl":"https://doi.org/10.1039/D4DD00323C","url":null,"abstract":"<p >Automation of electrochemical measurements can accelerate the discovery of new electroactive materials. One of the hurdles to automated electrochemical measurement is the pretreatment of electrodes because mechanical polishing is usually conducted manually. Here we investigate the automation of electrochemical measurements using a robotic arm. We demonstrate automated mechanical polishing using a station with a moving polishing pad and evaluate the effect of different polishing patterns. Our automatic method improved the corroded electrodes, and we found the effect of pattern was not significant, which diverges from the current common belief amongst practitioners that a figure eight pattern is best for pretreatment. This research is a step toward automating electrochemistry experiments without human intervention.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 2","pages":" 326-330"},"PeriodicalIF":6.2,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00323c?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143396408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated computational workflows for muon spin spectroscopy
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2025-01-10 DOI: 10.1039/D4DD00314D
Ifeanyi J. Onuorah, Miki Bonacci, Muhammad M. Isah, Marcello Mazzani, Roberto De Renzi, Giovanni Pizzi and Pietro Bonfà

Positive muon spin rotation and relaxation spectroscopy is a well established experimental technique for studying materials. It provides a local probe that generally complements scattering techniques in the study of magnetic systems and represents a valuable alternative for materials that display strong incoherent scattering or neutron absorption. Computational methods can effectively quantify the microscopic interactions underlying the experimentally observed signal, thus substantially boosting the predictive power of this technique. Here, we present an efficient set of algorithms and workflows devoted to the automation of this task. In particular, we adopt the so-called DFT+μ procedure, where the system is characterized in the density functional theory (DFT) framework with the muon modeled as a hydrogen impurity. We devise an automated strategy to obtain candidate muon stopping sites, their dipolar interaction with the nuclei, and hyperfine interactions with the electronic ground state. We validate the implementation on well-studied compounds, showing the effectiveness of our protocol in terms of accuracy and simplicity of use.

{"title":"Automated computational workflows for muon spin spectroscopy","authors":"Ifeanyi J. Onuorah, Miki Bonacci, Muhammad M. Isah, Marcello Mazzani, Roberto De Renzi, Giovanni Pizzi and Pietro Bonfà","doi":"10.1039/D4DD00314D","DOIUrl":"https://doi.org/10.1039/D4DD00314D","url":null,"abstract":"<p >Positive muon spin rotation and relaxation spectroscopy is a well established experimental technique for studying materials. It provides a local probe that generally complements scattering techniques in the study of magnetic systems and represents a valuable alternative for materials that display strong incoherent scattering or neutron absorption. Computational methods can effectively quantify the microscopic interactions underlying the experimentally observed signal, thus substantially boosting the predictive power of this technique. Here, we present an efficient set of algorithms and workflows devoted to the automation of this task. In particular, we adopt the so-called DFT+μ procedure, where the system is characterized in the density functional theory (DFT) framework with the muon modeled as a hydrogen impurity. We devise an automated strategy to obtain candidate muon stopping sites, their dipolar interaction with the nuclei, and hyperfine interactions with the electronic ground state. We validate the implementation on well-studied compounds, showing the effectiveness of our protocol in terms of accuracy and simplicity of use.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 2","pages":" 523-538"},"PeriodicalIF":6.2,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00314d?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143396441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
General data management workflow to process tabular data in automated and high-throughput heterogeneous catalysis research†‡
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2025-01-10 DOI: 10.1039/D4DD00350K
Erwin Lam, Tanguy Maury, Sebastian Preiss, Yuhui Hou, Hannes Frey, Caterina Barillari and Paco Laveille

Data management and processing are crucial steps to implement streamlined and standardized data workflows for automated and high-throughput laboratories. Electronic laboratory notebooks (ELNs) have proven to be effective to manage data in combination with a laboratory information management system (LIMS) to connect data and inventory. However, streamlined data processing does still pose a challenge on an ELN especially with large data. Herein we present a Python library that allows streamlining and automating data management of tabular data generated within a data-driven, automated high-throughput laboratory with a focus on heterogeneous catalysis R&D. This approach speeds up data processing and avoids errors introduced by manual data processing. Through the Python library, raw data from individual instruments related to a project are downloaded from an ELN, merged in a relational database fashion, processed and re-uploaded back to the ELN. Straightforward data merging is especially important, since information stemming from multiple devices needs to be processed together. By providing a configuration file that contains all the data management information, data merging and processing of individual data sources is executed. Having established streamlined data management workflows allows standardization of data handling and contributes to the implementation and use of open research data following Findable, Accessible, Interoperable and Reusable (FAIR) principles in the field of heterogeneous catalysis.

{"title":"General data management workflow to process tabular data in automated and high-throughput heterogeneous catalysis research†‡","authors":"Erwin Lam, Tanguy Maury, Sebastian Preiss, Yuhui Hou, Hannes Frey, Caterina Barillari and Paco Laveille","doi":"10.1039/D4DD00350K","DOIUrl":"https://doi.org/10.1039/D4DD00350K","url":null,"abstract":"<p >Data management and processing are crucial steps to implement streamlined and standardized data workflows for automated and high-throughput laboratories. Electronic laboratory notebooks (ELNs) have proven to be effective to manage data in combination with a laboratory information management system (LIMS) to connect data and inventory. However, streamlined data processing does still pose a challenge on an ELN especially with large data. Herein we present a Python library that allows streamlining and automating data management of tabular data generated within a data-driven, automated high-throughput laboratory with a focus on heterogeneous catalysis R&amp;D. This approach speeds up data processing and avoids errors introduced by manual data processing. Through the Python library, raw data from individual instruments related to a project are downloaded from an ELN, merged in a relational database fashion, processed and re-uploaded back to the ELN. Straightforward data merging is especially important, since information stemming from multiple devices needs to be processed together. By providing a configuration file that contains all the data management information, data merging and processing of individual data sources is executed. Having established streamlined data management workflows allows standardization of data handling and contributes to the implementation and use of open research data following Findable, Accessible, Interoperable and Reusable (FAIR) principles in the field of heterogeneous catalysis.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 2","pages":" 539-547"},"PeriodicalIF":6.2,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00350k?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143396442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting hydrogen atom transfer energy barriers using Gaussian process regression†
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2025-01-10 DOI: 10.1039/D4DD00174E
Evgeni Ulanov, Ghulam A. Qadir, Kai Riedmiller, Pascal Friederich and Frauke Gräter

Predicting reaction barriers for arbitrary configurations based on only a limited set of density functional theory (DFT) calculations would render the design of catalysts or the simulation of reactions within complex materials highly efficient. We here propose Gaussian process regression (GPR) as a method of choice if DFT calculations are limited to hundreds or thousands of barrier calculations. For the case of hydrogen atom transfer in proteins, an important reaction in chemistry and biology, we obtain a mean absolute error of 3.23 kcal mol−1 for the range of barriers in the data set using SOAP descriptors and similar values using the marginalized graph kernel. Thus, the two GPR models can robustly estimate reaction barriers within the large chemical and conformational space of proteins. Their predictive power is comparable to a graph neural network-based model, and GPR even outcompetes the latter in the low data regime. We propose GPR as a valuable tool for an approximate but data-efficient model of chemical reactivity in a complex and highly variable environment.

{"title":"Predicting hydrogen atom transfer energy barriers using Gaussian process regression†","authors":"Evgeni Ulanov, Ghulam A. Qadir, Kai Riedmiller, Pascal Friederich and Frauke Gräter","doi":"10.1039/D4DD00174E","DOIUrl":"10.1039/D4DD00174E","url":null,"abstract":"<p >Predicting reaction barriers for arbitrary configurations based on only a limited set of density functional theory (DFT) calculations would render the design of catalysts or the simulation of reactions within complex materials highly efficient. We here propose Gaussian process regression (GPR) as a method of choice if DFT calculations are limited to hundreds or thousands of barrier calculations. For the case of hydrogen atom transfer in proteins, an important reaction in chemistry and biology, we obtain a mean absolute error of 3.23 kcal mol<small><sup>−1</sup></small> for the range of barriers in the data set using SOAP descriptors and similar values using the marginalized graph kernel. Thus, the two GPR models can robustly estimate reaction barriers within the large chemical and conformational space of proteins. Their predictive power is comparable to a graph neural network-based model, and GPR even outcompetes the latter in the low data regime. We propose GPR as a valuable tool for an approximate but data-efficient model of chemical reactivity in a complex and highly variable environment.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 2","pages":" 513-522"},"PeriodicalIF":6.2,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11747964/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143030366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AI agents in chemical research: GVIM – an intelligent research assistant system†
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2025-01-10 DOI: 10.1039/D4DD00398E
Kangyong Ma

This work utilizes collected and organized instructional data from the field of chemical science to fine-tune mainstream open-source large language models. To objectively evaluate the performance of the fine-tuned models, we have developed an automated scoring system specifically for the chemistry domain, ensuring the accuracy and reliability of the evaluation results. Building on this foundation, we have designed an innovative chemical intelligent assistant system. This system employs the fine-tuned Mistral NeMo model as one of its primary models and features a mechanism for flexibly invoking various advanced models. This design fully considers the rapid iteration characteristics of large language models, ensuring that the system can continuously leverage the latest and most powerful AI capabilities. A major highlight of this system is its deep integration of professional knowledge and requirements from the chemistry field. By incorporating specialized functions such as molecular visualization, SMILES string processing, and chemical literature retrieval, the system significantly enhances its practical value in chemical research and applications. More notably, through carefully designed mechanisms for knowledge accumulation, skill acquisition, performance evaluation, and group collaboration, the system can optimize its professional abilities and interaction quality to a certain extent.

{"title":"AI agents in chemical research: GVIM – an intelligent research assistant system†","authors":"Kangyong Ma","doi":"10.1039/D4DD00398E","DOIUrl":"https://doi.org/10.1039/D4DD00398E","url":null,"abstract":"<p >This work utilizes collected and organized instructional data from the field of chemical science to fine-tune mainstream open-source large language models. To objectively evaluate the performance of the fine-tuned models, we have developed an automated scoring system specifically for the chemistry domain, ensuring the accuracy and reliability of the evaluation results. Building on this foundation, we have designed an innovative chemical intelligent assistant system. This system employs the fine-tuned Mistral NeMo model as one of its primary models and features a mechanism for flexibly invoking various advanced models. This design fully considers the rapid iteration characteristics of large language models, ensuring that the system can continuously leverage the latest and most powerful AI capabilities. A major highlight of this system is its deep integration of professional knowledge and requirements from the chemistry field. By incorporating specialized functions such as molecular visualization, SMILES string processing, and chemical literature retrieval, the system significantly enhances its practical value in chemical research and applications. More notably, through carefully designed mechanisms for knowledge accumulation, skill acquisition, performance evaluation, and group collaboration, the system can optimize its professional abilities and interaction quality to a certain extent.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 2","pages":" 355-375"},"PeriodicalIF":6.2,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00398e?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143396423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Digital discovery
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1