Digital discovery最新文献

英文中文

ULaMDyn: enhancing excited-state dynamics analysis through streamlined unsupervised learning

IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY

Digital discovery

Pub Date : 2025-01-08 DOI: 10.1039/D4DD00374H

Max Pinheiro, Matheus de Oliveira Bispo, Rafael S. Mattos, Mariana Telles do Casal, Bidhan Chandra Garain, Josene M. Toldo, Saikat Mukherjee and Mario Barbatti

The analysis of nonadiabatic molecular dynamics (NAMD) data presents significant challenges due to its high dimensionality and complexity. To address these issues, we introduce ULaMDyn, a Python-based, open-source package designed to automate the unsupervised analysis of large datasets generated by NAMD simulations. ULaMDyn integrates seamlessly with the Newton-X platform and employs advanced dimensionality reduction and clustering techniques to uncover hidden patterns in molecular trajectories, enabling a more intuitive understanding of excited-state processes. Using the photochemical dynamics of fulvene as a test case, we demonstrate how ULaMDyn efficiently identifies critical molecular geometries and critical nonadiabatic transitions. The package offers a streamlined, scalable solution for interpreting large NAMD datasets. It is poised to facilitate advances in the study of excited-state dynamics across a wide range of molecular systems.

由于非绝热分子动力学（NAMD）数据的高维性和复杂性，对其进行分析面临着巨大挑战。为了解决这些问题，我们推出了 ULaMDyn，这是一款基于 Python 的开源软件包，旨在自动对 NAMD 模拟生成的大型数据集进行无监督分析。ULaMDyn 与 Newton-X 平台无缝集成，采用先进的降维和聚类技术来揭示分子轨迹中隐藏的模式，从而更直观地了解激发态过程。我们以富勒烯的光化学动力学为测试案例，展示了 ULaMDyn 如何高效地识别临界分子几何形状和临界非绝热转变。该软件包为解释大型 NAMD 数据集提供了简化、可扩展的解决方案。它将推动对各种分子系统激发态动力学的研究取得进展。

引用次数: 0

Advancing predictive toxicology: overcoming hurdles and shaping the future

IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY

Digital discovery

Pub Date : 2025-01-06 DOI: 10.1039/D4DD00257A

Sara Masarone, Katie V. Beckwith, Matthew R. Wilkinson, Shreshth Tuli, Amy Lane, Sam Windsor, Jordan Lane and Layla Hosseini-Gerami

Modern drug discovery projects are plagued with high failure rates, many of which have safety as the underlying cause. The drug discovery process involves selecting the right compounds from a pool of possible candidates to satisfy some pre-set requirements. As this process is costly and time consuming, finding toxicities at later stages can result in project failure. In this context, the use of existing data from previous projects can help develop computational models (e.g. QSARs) and algorithms to speed up the identification of compound toxicity. While clinical and in vivo data continues to be fundamental, data originating from organ-on-a-chip models, cell lines and previous studies can accelerate the drug discovery process allowing for faster identification of toxicities and thus saving time and resources.

引用次数: 0

A novel approach to protein chemical shift prediction from sequences using a protein language model†

IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY

Digital discovery

Pub Date : 2025-01-06 DOI: 10.1039/D4DD00367E

He Zhu, Lingyue Hu, Yu Yang and Zhong Chen

Chemical shifts are crucial parameters in protein Nuclear Magnetic Resonance (NMR) experiments. Specifically, the chemical shifts of backbone atoms are essential for determining the constraints in protein structure analysis. Despite their importance, protein NMR experiments are costly and spectral analysis presents challenges due to sample impurities, complex experimental environments, and spectral overlap. Here, we propose a chemical shift prediction method that requires only protein sequences as input. This low-cost chemical shift predictor provides a chemical shift corresponding to each backbone atom, offers valuable prior information for peak assignment, and can significantly aid protein NMR spectrum analysis. Our approach leverages recent advances in pre-trained protein language models (PLMs) and employs a deep learning model to obtain chemical shifts. Different from other chemical shift prediction programs, our method does not require protein structures as input, significantly reducing costs and enhancing robustness. Our method can achieve comparable accuracy to other existing programs that require protein structures as input. In summary, this work introduces a novel method for protein chemical shift prediction and demonstrates the potential of PLMs for diverse applications.

{"title":"A novel approach to protein chemical shift prediction from sequences using a protein language model†","authors":"He Zhu, Lingyue Hu, Yu Yang and Zhong Chen","doi":"10.1039/D4DD00367E","DOIUrl":"https://doi.org/10.1039/D4DD00367E","url":null,"abstract":"Chemical shifts are crucial parameters in protein Nuclear Magnetic Resonance (NMR) experiments. Specifically, the chemical shifts of backbone atoms are essential for determining the constraints in protein structure analysis. Despite their importance, protein NMR experiments are costly and spectral analysis presents challenges due to sample impurities, complex experimental environments, and spectral overlap. Here, we propose a chemical shift prediction method that requires only protein sequences as input. This low-cost chemical shift predictor provides a chemical shift corresponding to each backbone atom, offers valuable prior information for peak assignment, and can significantly aid protein NMR spectrum analysis. Our approach leverages recent advances in pre-trained protein language models (PLMs) and employs a deep learning model to obtain chemical shifts. Different from other chemical shift prediction programs, our method does not require protein structures as input, significantly reducing costs and enhancing robustness. Our method can achieve comparable accuracy to other existing programs that require protein structures as input. In summary, this work introduces a novel method for protein chemical shift prediction and demonstrates the potential of PLMs for diverse applications.","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 2","pages":" 331-337"},"PeriodicalIF":6.2,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00367e?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143396409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-objective Bayesian optimization: a case study in material extrusion

IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY

Digital discovery

Pub Date : 2025-01-06 DOI: 10.1039/D4DD00281D

Jay I. Myung, James R. Deneault, Jorge Chang, Inhan Kang, Benji Maruyama and Mark A. Pitt

Autonomous experimentation is a rapidly growing approach to materials science research. Machine learning can assist in improving the efficiency and capability of experimentation with algorithms that adaptively identify optimal design parameters that achieve one or more objectives in iterative, closed-loop fashion. Optimization in additive manufacturing, which can be slow and costly because of its complexity, stands to benefit greatly from such technologies. The present study demonstrates the application of an algorithm (multi-objective Bayesian optimization; MOBO) that optimizes two objectives simultaneously given multiple parameter inputs. The generality and robustness of MOBO are demonstrated in repeated print campaigns of two different test specimens. The results push the boundaries of integrating machine learning with autonomous experimentation for accelerated materials development in additive manufacturing and related areas.

引用次数: 0

SMARTpy: a Python package for the generation of cavity steric molecular descriptors and applications to diverse systems†

IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY

Digital discovery

Pub Date : 2025-01-03 DOI: 10.1039/D4DD00329B

Beck R. Miller, Ryan C. Cammarota and Matthew S. Sigman

Steric molecular descriptors designed for machine learning (ML) applications are critical for connecting structure–function relationships to mechanistic insight. However, many of these descriptors are not suitable for application to complex systems, such as catalyst reactive site pockets. In this context, we recently disclosed a new set of 3D steric molecular descriptors that were originally designed for dirhodium(II) tetra-carboxylate catalysts. Herein, we expand the spatial molding for rigid targets (SMART) descriptor toolkit by releasing SMARTpy; an automated, open-source Python API package for computational workflow integration of SMART descriptors. The impact of the structure of the molecular probe for generation of SMART descriptors was analyzed. Resultant SMART descriptors and pocket features were found to be highly dependent upon probe selection, and do not scale linearly. Flexible probes with smaller substituents can explore narrow pocket regions resulting in a higher resolution pocket imprint. Macrocyclic probes with larger substituents are more applicable to larger cavities with smooth boundaries, such as dirhodium paddlewheel complexes. In these cases, SMARTpy provides comparable descriptors to the original calculation method using UCSF Chimera. Finally, we analyzed a series of case studies demonstrating how SMART descriptors can impact other areas of catalysis, such as organocatalysis, biocatalysis, and protein pocket analysis.

{"title":"SMARTpy: a Python package for the generation of cavity steric molecular descriptors and applications to diverse systems†","authors":"Beck R. Miller, Ryan C. Cammarota and Matthew S. Sigman","doi":"10.1039/D4DD00329B","DOIUrl":"https://doi.org/10.1039/D4DD00329B","url":null,"abstract":"Steric molecular descriptors designed for machine learning (ML) applications are critical for connecting structure–function relationships to mechanistic insight. However, many of these descriptors are not suitable for application to complex systems, such as catalyst reactive site pockets. In this context, we recently disclosed a new set of 3D steric molecular descriptors that were originally designed for dirhodium(II) tetra-carboxylate catalysts. Herein, we expand the spatial molding for rigid targets (SMART) descriptor toolkit by releasing SMARTpy; an automated, open-source Python API package for computational workflow integration of SMART descriptors. The impact of the structure of the molecular probe for generation of SMART descriptors was analyzed. Resultant SMART descriptors and pocket features were found to be highly dependent upon probe selection, and do not scale linearly. Flexible probes with smaller substituents can explore narrow pocket regions resulting in a higher resolution pocket imprint. Macrocyclic probes with larger substituents are more applicable to larger cavities with smooth boundaries, such as dirhodium paddlewheel complexes. In these cases, SMARTpy provides comparable descriptors to the original calculation method using UCSF Chimera. Finally, we analyzed a series of case studies demonstrating how SMART descriptors can impact other areas of catalysis, such as organocatalysis, biocatalysis, and protein pocket analysis.","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 2","pages":" 451-463"},"PeriodicalIF":6.2,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00329b?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143396432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Digital features of chemical elements extracted from local geometries in crystal structures†

IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY

Digital discovery

Pub Date : 2025-01-03 DOI: 10.1039/D4DD00346B

Andrij Vasylenko, Dmytro Antypov, Sven Schewe, Luke M. Daniels, John B. Claridge, Matthew S. Dyer and Matthew J. Rosseinsky

Computational modelling of materials using machine learning (ML) and historical data has become integral to materials research across physical sciences. The accuracy of predictions for material properties using computational modelling is strongly affected by the choice of the numerical representation that describes a material's composition, crystal structure and constituent chemical elements. Structure, both extended and local, has a controlling effect on properties, but often only the composition of a candidate material is available. However, existing elemental and compositional descriptors lack direct access to structural insights such as the coordination geometry of an element. In this study, we introduce Local Environment-induced Atomic Features (LEAFs), which incorporate information about the statistically preferred local coordination geometry at an element in a crystal structure into descriptors for chemical elements, enabling the modelling of materials solely as compositions without requiring knowledge of their crystal structure. In the crystal structure of a material, each atomic site can be quantitatively described by similarity to common local structural motifs; by aggregating these unique features of similarity from the experimentally verified crystal structures of inorganic materials, LEAFs formulate a set of descriptors for chemical elements and compositions. The direct connection of LEAFs to the local coordination geometry enables the analysis of ML model property predictions, linking compositions to the underlying structure–property relationships. We demonstrate the versatility of LEAFs in structure-informed property predictions for compositions, mapping of chemical space in structural terms, and prioritisation of elemental substitutions. Based on the latter for predicting crystal structures of binary ionic compounds, LEAFs achieve the state-of-the-art accuracy of 86%. These results suggest that the structurally informed description of chemical elements and compositions developed in this work can effectively guide synthetic efforts in discovering new materials.

{"title":"Digital features of chemical elements extracted from local geometries in crystal structures†","authors":"Andrij Vasylenko, Dmytro Antypov, Sven Schewe, Luke M. Daniels, John B. Claridge, Matthew S. Dyer and Matthew J. Rosseinsky","doi":"10.1039/D4DD00346B","DOIUrl":"https://doi.org/10.1039/D4DD00346B","url":null,"abstract":"Computational modelling of materials using machine learning (ML) and historical data has become integral to materials research across physical sciences. The accuracy of predictions for material properties using computational modelling is strongly affected by the choice of the numerical representation that describes a material's composition, crystal structure and constituent chemical elements. Structure, both extended and local, has a controlling effect on properties, but often only the composition of a candidate material is available. However, existing elemental and compositional descriptors lack direct access to structural insights such as the coordination geometry of an element. In this study, we introduce Local Environment-induced Atomic Features (LEAFs), which incorporate information about the statistically preferred local coordination geometry at an element in a crystal structure into descriptors for chemical elements, enabling the modelling of materials solely as compositions without requiring knowledge of their crystal structure. In the crystal structure of a material, each atomic site can be quantitatively described by similarity to common local structural motifs; by aggregating these unique features of similarity from the experimentally verified crystal structures of inorganic materials, LEAFs formulate a set of descriptors for chemical elements and compositions. The direct connection of LEAFs to the local coordination geometry enables the analysis of ML model property predictions, linking compositions to the underlying structure–property relationships. We demonstrate the versatility of LEAFs in structure-informed property predictions for compositions, mapping of chemical space in structural terms, and prioritisation of elemental substitutions. Based on the latter for predicting crystal structures of binary ionic compounds, LEAFs achieve the state-of-the-art accuracy of 86%. These results suggest that the structurally informed description of chemical elements and compositions developed in this work can effectively guide synthetic efforts in discovering new materials.","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 2","pages":" 477-485"},"PeriodicalIF":6.2,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00346b?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143396434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Catalytic resonance theory: forecasting the flow of programmable catalytic loops†

IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY

Digital discovery

Pub Date : 2024-12-30 DOI: 10.1039/D4DD00216D

Madeline A. Murphy, Kyle Noordhoek, Sallye R. Gathmann, Paul J. Dauenhauer and Christopher J. Bartel

Chemical transformations on catalyst surfaces occur through series and parallel reaction pathways. These complex networks and their behavior can be most simply evaluated through a three-species surface reaction loop (A* to B* to C* to A*) that is internal to the overall chemical reaction. Application of an oscillating dynamic catalyst to this reactive loop has been shown to exhibit one of three types of behavior: (1) a positive net flux of molecules about the loop in the clockwise direction, (2) a negative net flux of molecules about the loop in the counterclockwise direction, or (3) negligible flux of molecules about the loop at the limit cycle of reaction. Three-species surface loops were simulated with microkinetic modeling to assess the reaction loop behavior resulting from a catalytic surface oscillating between two or more catalyst surface energy states. Selected input parameters for the simulations spanned an 11-dimensional parameter space using 127 688 different parameter combinations. Their converged limit cycle solutions were analyzed for their loop turnover frequencies, the majority of which were found to be approximately zero. Classification and regression machine learning models were trained to predict the sign and magnitude of the loop turnover frequency and successfully performed above accessible baselines. Notably, the classification models exhibited a baseline weighted F₁ score of 0.49, whereas trained models achieved weighted F₁ scores of 0.94 and 0.96 when trained on the parameters used to define the simulations and derived rate constants, respectively. The trained models successfully predicted catalytic loop behavior, and interpretation of these models revealed all input parameters to be important for the prediction and performance of each model.

{"title":"Catalytic resonance theory: forecasting the flow of programmable catalytic loops†","authors":"Madeline A. Murphy, Kyle Noordhoek, Sallye R. Gathmann, Paul J. Dauenhauer and Christopher J. Bartel","doi":"10.1039/D4DD00216D","DOIUrl":"https://doi.org/10.1039/D4DD00216D","url":null,"abstract":"Chemical transformations on catalyst surfaces occur through series and parallel reaction pathways. These complex networks and their behavior can be most simply evaluated through a three-species surface reaction loop (A* to B* to C* to A*) that is internal to the overall chemical reaction. Application of an oscillating dynamic catalyst to this reactive loop has been shown to exhibit one of three types of behavior: (1) a positive net flux of molecules about the loop in the clockwise direction, (2) a negative net flux of molecules about the loop in the counterclockwise direction, or (3) negligible flux of molecules about the loop at the limit cycle of reaction. Three-species surface loops were simulated with microkinetic modeling to assess the reaction loop behavior resulting from a catalytic surface oscillating between two or more catalyst surface energy states. Selected input parameters for the simulations spanned an 11-dimensional parameter space using 127 688 different parameter combinations. Their converged limit cycle solutions were analyzed for their loop turnover frequencies, the majority of which were found to be approximately zero. Classification and regression machine learning models were trained to predict the sign and magnitude of the loop turnover frequency and successfully performed above accessible baselines. Notably, the classification models exhibited a baseline weighted F1 score of 0.49, whereas trained models achieved weighted F1 scores of 0.94 and 0.96 when trained on the parameters used to define the simulations and derived rate constants, respectively. The trained models successfully predicted catalytic loop behavior, and interpretation of these models revealed all input parameters to be important for the prediction and performance of each model.","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 2","pages":" 411-423"},"PeriodicalIF":6.2,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00216d?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143396430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Predicting homopolymer and copolymer solubility through machine learning†

IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY

Digital discovery

Pub Date : 2024-12-24 DOI: 10.1039/D4DD00290C

Christopher D. Stubbs, Yeonjoon Kim, Ethan C. Quinn, Raúl Pérez-Soto, Eugene Y.-X. Chen and Seonah Kim

Polymer solubility has applications in many important and diverse fields, including microprocessor fabrication, environmental conservation, paint formulation, and drug delivery, but it remains under-explored compared to its relative importance. This can be seen in the relative scarcity of solvent-based systems for recycling plastics, despite a need for efficient and selective methods amid the looming plastics and climate crises. Towards this need for better predictive tools, this work examines the use of classical and deep machine learning (ML) models for predicting categorical solubility in homopolymers and copolymers, with model architectures including random forest (RF), decision tree (DT), naive Bayes, AdaBoost, and graph neural networks (GNNs). We achieve high accuracy for both our homopolymer (82%, RF) and copolymer models (92%, RF) on unseen polymer–solvent systems in our 5-fold cross-validation studies. The relevance and applicability of our homopolymer models are then verified through in-house experiments examining the solubility of common commercial plastics, followed by an explainable AI (XAI) analysis using Shapley Additive Explanations (SHAP), which explores the relative contribution of each feature toward model predictions. We then apply our homopolymer solubility prediction model to remove unwanted or hazardous additives in polyethylene (PE) and polystyrene (PS) waste. This work demonstrates the validity/feasibility of using ML to predict homopolymer solubility, provides novel ML models for the prediction of copolymer solubility, and explains homopolymer model predictions before applying the explained model to a globally relevant waste challenge.

{"title":"Predicting homopolymer and copolymer solubility through machine learning†","authors":"Christopher D. Stubbs, Yeonjoon Kim, Ethan C. Quinn, Raúl Pérez-Soto, Eugene Y.-X. Chen and Seonah Kim","doi":"10.1039/D4DD00290C","DOIUrl":"https://doi.org/10.1039/D4DD00290C","url":null,"abstract":"Polymer solubility has applications in many important and diverse fields, including microprocessor fabrication, environmental conservation, paint formulation, and drug delivery, but it remains under-explored compared to its relative importance. This can be seen in the relative scarcity of solvent-based systems for recycling plastics, despite a need for efficient and selective methods amid the looming plastics and climate crises. Towards this need for better predictive tools, this work examines the use of classical and deep machine learning (ML) models for predicting categorical solubility in homopolymers and copolymers, with model architectures including random forest (RF), decision tree (DT), naive Bayes, AdaBoost, and graph neural networks (GNNs). We achieve high accuracy for both our homopolymer (82%, RF) and copolymer models (92%, RF) on unseen polymer–solvent systems in our 5-fold cross-validation studies. The relevance and applicability of our homopolymer models are then verified through in-house experiments examining the solubility of common commercial plastics, followed by an explainable AI (XAI) analysis using Shapley Additive Explanations (SHAP), which explores the relative contribution of each feature toward model predictions. We then apply our homopolymer solubility prediction model to remove unwanted or hazardous additives in polyethylene (PE) and polystyrene (PS) waste. This work demonstrates the validity/feasibility of using ML to predict homopolymer solubility, provides novel ML models for the prediction of copolymer solubility, and explains homopolymer model predictions before applying the explained model to a globally relevant waste challenge.","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 2","pages":" 424-437"},"PeriodicalIF":6.2,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00290c?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143396431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Knowledge discovery from porous organic cage literature using a large language model†

IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY

Digital discovery

Pub Date : 2024-12-19 DOI: 10.1039/D4DD00337C

Yaoyi Su, Siyuan Yang, Yuanhan Liu, Aiting Kai, Linjiang Chen and Ming Liu

Porous organic cages (POCs) are an emerging subclass of porous materials, drawing increasing attention due to their structural tunability, modularity and processibility, with the research in this area rapidly expanding. Nevertheless, it is a time-consuming and labour-intensive process to obtain sufficient information from the extensive literature on organic molecular cages. This article presents a GPT-4-based literature reading method that incorporates multi-label text classification and a follow-up information extraction, in which the potential of GPT-4 can be fully exploited to rapidly extract valid information from the literature. In the process of multi-label text classification, the prompt-engineered GPT-4 demonstrated the ability to label text with proper recall rates according to the type of information contained in the text, including authors, affiliations, synthetic procedures, surface area, and the Cambridge Crystallographic Data Centre (CCDC) number of corresponding cages. Additionally, GPT-4 demonstrated proficiency in information extraction, effectively transforming labeled text into concise tabulated data. Furthermore, we built a chatbot based on this database, allowing for quick and comprehensive searching across the entire database and responding to cage-related questions.

{"title":"Knowledge discovery from porous organic cage literature using a large language model†","authors":"Yaoyi Su, Siyuan Yang, Yuanhan Liu, Aiting Kai, Linjiang Chen and Ming Liu","doi":"10.1039/D4DD00337C","DOIUrl":"https://doi.org/10.1039/D4DD00337C","url":null,"abstract":"Porous organic cages (POCs) are an emerging subclass of porous materials, drawing increasing attention due to their structural tunability, modularity and processibility, with the research in this area rapidly expanding. Nevertheless, it is a time-consuming and labour-intensive process to obtain sufficient information from the extensive literature on organic molecular cages. This article presents a GPT-4-based literature reading method that incorporates multi-label text classification and a follow-up information extraction, in which the potential of GPT-4 can be fully exploited to rapidly extract valid information from the literature. In the process of multi-label text classification, the prompt-engineered GPT-4 demonstrated the ability to label text with proper recall rates according to the type of information contained in the text, including authors, affiliations, synthetic procedures, surface area, and the Cambridge Crystallographic Data Centre (CCDC) number of corresponding cages. Additionally, GPT-4 demonstrated proficiency in information extraction, effectively transforming labeled text into concise tabulated data. Furthermore, we built a chatbot based on this database, allowing for quick and comprehensive searching across the entire database and responding to cage-related questions.","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 2","pages":" 403-410"},"PeriodicalIF":6.2,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00337c?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143396429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Activity recognition in scientific experimentation using multimodal visual encoding†

IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY

Digital discovery

Pub Date : 2024-12-19 DOI: 10.1039/D4DD00287C

Gianmarco Gabrieli, Irina Espejo Morales, Dimitrios Christofidellis, Mara Graziani, Andrea Giovannini, Federico Zipoli, Amol Thakkar, Antonio Foncubierta, Matteo Manica and Patrick W. Ruch

Capturing actions during scientific experimentation is a cornerstone of reproducibility and collaborative research. While large multimodal models hold promise for automatic action (or activity) recognition, their ability to provide real-time captioning of scientific actions remains to be explored. Leveraging multimodal egocentric videos and model finetuning for chemical experimentation, we study the action recognition performance of Vision Transformer (ViT) encoders coupled either to a multi-label classification head or a pretrained language model, as well as that of two state-of-the-art vision-language models, Video-LLaVA and X-CLIP. Highest fidelity was achieved for models coupled with trained classification heads or a fine-tuned language model decoder, for which individual actions were recognized with F1 scores between 0.29–0.57 and action sequences were transcribed at normalized Levenshtein ratios of 0.59–0.75, while inference efficiency was highest for models based on ViT encoders coupled to classifiers, yielding a 3-fold relative inference speed-up on GPU over language-assisted models. While models comprising generative language components were penalized in terms of inference time, we demonstrate that augmenting egocentric videos with gaze information increases the F1 score (0.52 → 0.61) and Levenshtein ratio (0.63 → 0.72, p = 0.047) for the language-assisted ViT encoder. Based on our evaluation of preferred model configurations, we propose the use of multimodal models for near real-time action recognition in scientific experimentation as viable approach for automatic documentation of laboratory work.

{"title":"Activity recognition in scientific experimentation using multimodal visual encoding†","authors":"Gianmarco Gabrieli, Irina Espejo Morales, Dimitrios Christofidellis, Mara Graziani, Andrea Giovannini, Federico Zipoli, Amol Thakkar, Antonio Foncubierta, Matteo Manica and Patrick W. Ruch","doi":"10.1039/D4DD00287C","DOIUrl":"https://doi.org/10.1039/D4DD00287C","url":null,"abstract":"Capturing actions during scientific experimentation is a cornerstone of reproducibility and collaborative research. While large multimodal models hold promise for automatic action (or activity) recognition, their ability to provide real-time captioning of scientific actions remains to be explored. Leveraging multimodal egocentric videos and model finetuning for chemical experimentation, we study the action recognition performance of Vision Transformer (ViT) encoders coupled either to a multi-label classification head or a pretrained language model, as well as that of two state-of-the-art vision-language models, Video-LLaVA and X-CLIP. Highest fidelity was achieved for models coupled with trained classification heads or a fine-tuned language model decoder, for which individual actions were recognized with F1 scores between 0.29–0.57 and action sequences were transcribed at normalized Levenshtein ratios of 0.59–0.75, while inference efficiency was highest for models based on ViT encoders coupled to classifiers, yielding a 3-fold relative inference speed-up on GPU over language-assisted models. While models comprising generative language components were penalized in terms of inference time, we demonstrate that augmenting egocentric videos with gaze information increases the F1 score (0.52 → 0.61) and Levenshtein ratio (0.63 → 0.72, p = 0.047) for the language-assisted ViT encoder. Based on our evaluation of preferred model configurations, we propose the use of multimodal models for near real-time action recognition in scientific experimentation as viable approach for automatic documentation of laboratory work.","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 2","pages":" 393-402"},"PeriodicalIF":6.2,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00287c?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143396428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Digital discovery

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀