Yves Grandjean, David Kreutter, Jean-Louis Reymond
Reactions in the US Patent Office (USPTO) are biased towards a few over-represented reaction types, which potentially limits their usefulness for computer-assisted synthesis planning (CASP). To obtain an equilibrated dataset, we applied retrosynthesis templates to USPTO molecules as products (P) to generate starting materials (SM). We then used transformer T2 from our recently reported triple transformer loop (TTL) retrosynthesis model to predict reagents (R) for the SM → P reaction. Finally, we validated the prediction by requesting a high confidence prediction (>95%) for the prediction of P from SM + R by TTL transformer T3. We generated up to 5000 reactions per template, resulting in 27.5m validated fictive reactions covering the chemical space of the original USPTO dataset. To exemplify the use of this dataset, we demonstrate that a single-step retrosynthesis transformer model trained on a template equilibrated subset of 1 097 374 fictive reactions outperforms the corresponding model trained on USPTO reactions only.
{"title":"Data augmentation in a triple transformer loop retrosynthesis model.","authors":"Yves Grandjean, David Kreutter, Jean-Louis Reymond","doi":"10.1039/d5dd00465a","DOIUrl":"https://doi.org/10.1039/d5dd00465a","url":null,"abstract":"<p><p>Reactions in the US Patent Office (USPTO) are biased towards a few over-represented reaction types, which potentially limits their usefulness for computer-assisted synthesis planning (CASP). To obtain an equilibrated dataset, we applied retrosynthesis templates to USPTO molecules as products (P) to generate starting materials (SM). We then used transformer T2 from our recently reported triple transformer loop (TTL) retrosynthesis model to predict reagents (R) for the SM → P reaction. Finally, we validated the prediction by requesting a high confidence prediction (>95%) for the prediction of P from SM + R by TTL transformer T3. We generated up to 5000 reactions per template, resulting in 27.5m validated fictive reactions covering the chemical space of the original USPTO dataset. To exemplify the use of this dataset, we demonstrate that a single-step retrosynthesis transformer model trained on a template equilibrated subset of 1 097 374 fictive reactions outperforms the corresponding model trained on USPTO reactions only.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" ","pages":""},"PeriodicalIF":6.2,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12878001/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146144804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Machine learning (ML) approaches have drastically advanced the exploration of structure-property and property-property relationships in computer-aided drug discovery. A central challenge in this field is the identification of molecular descriptors that can effectively capture both geometric- and electronic structure-derived features, enabling the development of reliable and interpretable predictive models. While numerous descriptors focusing solely on structural characteristics have been recently proposed, improvements in model accuracy often come at the cost of increased computational demands, thereby restricting their practical applicability. To address this challenge, we introduce the "QUantum Electronic Descriptor" (QUED) framework, which integrates both structural and electronic data of molecules to develop ML regression models for property prediction. In doing so, a quantum-mechanical (QM) descriptor is derived from molecular and atomic properties computed using the semi-empirical density functional tight-binding (DFTB) method, which allows for efficient modelling of both small and large drug-like molecules. This descriptor is combined with inexpensive geometric descriptors-capturing two-body and three-body interatomic interactions-to form comprehensive molecular representations used to train Kernel Ridge Regression and XGBoost models. As a proof of concept, we validate QUED using the QM7-X dataset, which comprises equilibrium and non-equilibrium conformations of small drug-like molecules, demonstrating that incorporating electronic structure data notably enhances the accuracy of ML models for predicting physicochemical properties. For biological endpoints, we find that QM properties provide some predictive value for toxicity and lipophilicity prediction, as assessed using the TDCommons-LD50 and the MoleculeNet benchmark datasets. Moreover, a SHapley Additive exPlanations (SHAP) analysis of the toxicity and lipophilicity predictive models reveals that molecular orbital energies and DFTB energy components are among the most influential electronic features. Hence, our work underscores the importance of incorporating QM descriptors to enhance both the accuracy and interpretability of ML models for predicting multiple properties relevant to pharmaceutical and biological applications.
{"title":"Assessing the performance of quantum-mechanical descriptors in physicochemical and biological property prediction.","authors":"Alejandra Hinostroza Caldas, Artem Kokorin, Alexandre Tkatchenko, Leonardo Medrano Sandonas","doi":"10.1039/d5dd00411j","DOIUrl":"10.1039/d5dd00411j","url":null,"abstract":"<p><p>Machine learning (ML) approaches have drastically advanced the exploration of structure-property and property-property relationships in computer-aided drug discovery. A central challenge in this field is the identification of molecular descriptors that can effectively capture both geometric- and electronic structure-derived features, enabling the development of reliable and interpretable predictive models. While numerous descriptors focusing solely on structural characteristics have been recently proposed, improvements in model accuracy often come at the cost of increased computational demands, thereby restricting their practical applicability. To address this challenge, we introduce the \"QUantum Electronic Descriptor\" (QUED) framework, which integrates both structural and electronic data of molecules to develop ML regression models for property prediction. In doing so, a quantum-mechanical (QM) descriptor is derived from molecular and atomic properties computed using the semi-empirical density functional tight-binding (DFTB) method, which allows for efficient modelling of both small and large drug-like molecules. This descriptor is combined with inexpensive geometric descriptors-capturing two-body and three-body interatomic interactions-to form comprehensive molecular representations used to train Kernel Ridge Regression and XGBoost models. As a proof of concept, we validate QUED using the QM7-X dataset, which comprises equilibrium and non-equilibrium conformations of small drug-like molecules, demonstrating that incorporating electronic structure data notably enhances the accuracy of ML models for predicting physicochemical properties. For biological endpoints, we find that QM properties provide some predictive value for toxicity and lipophilicity prediction, as assessed using the TDCommons-LD<sub>50</sub> and the MoleculeNet benchmark datasets. Moreover, a SHapley Additive exPlanations (SHAP) analysis of the toxicity and lipophilicity predictive models reveals that molecular orbital energies and DFTB energy components are among the most influential electronic features. Hence, our work underscores the importance of incorporating QM descriptors to enhance both the accuracy and interpretability of ML models for predicting multiple properties relevant to pharmaceutical and biological applications.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" ","pages":""},"PeriodicalIF":6.2,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12820757/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146031825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Luis H. M. Torres, Sofia M. da Silva, Joel P. Arrais, Catarina Pimentel and Bernardete Ribeiro
Correction for ‘Advancing mutagenicity predictions in drug discovery with an explainable few-shot deep learning framework’ by Luis H. M. Torres et al., Digital Discovery, 2025, 4, 3515–3532, https://doi.org/10.1039/D5DD00276A.
更正Luis H. M. Torres等人的“利用可解释的少量深度学习框架推进药物发现中的突变性预测”,《数字发现》,2025,4,3515 - 3532,https://doi.org/10.1039/D5DD00276A。
{"title":"Correction: Advancing mutagenicity predictions in drug discovery with an explainable few-shot deep learning framework","authors":"Luis H. M. Torres, Sofia M. da Silva, Joel P. Arrais, Catarina Pimentel and Bernardete Ribeiro","doi":"10.1039/D5DD90058A","DOIUrl":"https://doi.org/10.1039/D5DD90058A","url":null,"abstract":"<p >Correction for ‘Advancing mutagenicity predictions in drug discovery with an explainable few-shot deep learning framework’ by Luis H. M. Torres <em>et al.</em>, <em>Digital Discovery</em>, 2025, <strong>4</strong>, 3515–3532, https://doi.org/10.1039/D5DD00276A.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 463-463"},"PeriodicalIF":6.2,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd90058a?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Babak Mahjour, Felix Katzenburg, Emil Lammi and Tim Cernak
In this report, the pharmaceuticals listed in DrugBank were structurally mapped to a commercial catalog of chemical feedstocks through reaction agnostic one step retrosynthetic decomposition. Enumerative combinatorics was utilized to retrosynthesize target molecules into commercially available building blocks, wherein only the bond formed and the minimal substructure template of each building block class are considered. In contrast to the status quo in automated retrosynthesis, our algorithm may suggest reactions that do not yet exist but, if they did, could enable the synthesis of drugs in just one reaction step from commercial feedstocks. Cross-referencing synthons to commercial datasets can thus reveal valuable reaction classes for development in addition to streamlining drug production. Decomposed synthons were linked to target molecules by transformations that form one bond after the elimination of each synthon's respective reactive functional handle, as indicated by their building block class. Specific reactivities were analyzed after post hoc refinement and clustering of commercial synthons. Maps between boronates, bromides, iodides, amines, acids, chlorides, alcohols, and various C–H motifs to form alkyl–alkyl, alkyl–aryl, and aryl–aryl carbon–carbon, carbon–nitrogen, and carbon–oxygen bonds are reported herein, with specific examples for each provided.
{"title":"One step retrosynthesis of drugs from commercially available chemical building blocks and conceivable coupling reactions","authors":"Babak Mahjour, Felix Katzenburg, Emil Lammi and Tim Cernak","doi":"10.1039/D5DD00310E","DOIUrl":"https://doi.org/10.1039/D5DD00310E","url":null,"abstract":"<p >In this report, the pharmaceuticals listed in DrugBank were structurally mapped to a commercial catalog of chemical feedstocks through reaction agnostic one step retrosynthetic decomposition. Enumerative combinatorics was utilized to retrosynthesize target molecules into commercially available building blocks, wherein only the bond formed and the minimal substructure template of each building block class are considered. In contrast to the status quo in automated retrosynthesis, our algorithm may suggest reactions that do not yet exist but, if they did, could enable the synthesis of drugs in just one reaction step from commercial feedstocks. Cross-referencing synthons to commercial datasets can thus reveal valuable reaction classes for development in addition to streamlining drug production. Decomposed synthons were linked to target molecules by transformations that form one bond after the elimination of each synthon's respective reactive functional handle, as indicated by their building block class. Specific reactivities were analyzed after <em>post hoc</em> refinement and clustering of commercial synthons. Maps between boronates, bromides, iodides, amines, acids, chlorides, alcohols, and various C–H motifs to form alkyl–alkyl, alkyl–aryl, and aryl–aryl carbon–carbon, carbon–nitrogen, and carbon–oxygen bonds are reported herein, with specific examples for each provided.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 153-160"},"PeriodicalIF":6.2,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00310e?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kento Murakami, Yudai Yamaguchi, Yo Kato, Kazuki Ishikawa, Naoto Tanibata, Hayami Takeda, Masanobu Nakayama and Masayuki Karasuyama
Lithium-ion-conductive oxide materials have attracted considerable attention as solid electrolytes for all-solid-state batteries. In particular, LiZr2(PO4)3-related compounds are promising for high-energy-density devices using metallic lithium anodes, but further enhancement of their ionic conductivity is requested. In general, Li-ion conductivity is influenced by mechanisms operating on two distinct length scales. At the atomic scale, point defects and the associated migration barriers within the crystal lattice are critical, whereas at the micrometre scale, porosity and grain-boundary characteristics that develop during sintering become the dominant factors. These coupled effects make systematic optimization of conductivity difficult. In paticular, microstructural analysis has often relied on researchers' intuitive interpretation of scanning electron microscopy (SEM) images. Here, we apply a convolutional neural network (CNN), a deep-learning approach that has seen rapid advances in image analysis, to SEM images of LiZr2(PO4)3-based electrolytes. By combining image-derived features with conventional vector descriptors (composition, sintering parameters, etc.), our regression model achieved an R2 of 0.871. Furthermore, visual-interpretability analysis of the trained CNN revealed that grain-boundary regions were highlighted as low-conductivity areas. These findings demonstrate that deep-learning-based SEM analysis enables automated, quantitative evaluation of ionic conductivity and offers a powerful tool for accelerating the development of solid electrolyte materials.
{"title":"Deep learning based SEM image analysis for predicting ionic conductivity in LiZr2(PO4)3-based solid electrolytes","authors":"Kento Murakami, Yudai Yamaguchi, Yo Kato, Kazuki Ishikawa, Naoto Tanibata, Hayami Takeda, Masanobu Nakayama and Masayuki Karasuyama","doi":"10.1039/D5DD00232J","DOIUrl":"https://doi.org/10.1039/D5DD00232J","url":null,"abstract":"<p >Lithium-ion-conductive oxide materials have attracted considerable attention as solid electrolytes for all-solid-state batteries. In particular, LiZr<small><sub>2</sub></small>(PO<small><sub>4</sub></small>)<small><sub>3</sub></small>-related compounds are promising for high-energy-density devices using metallic lithium anodes, but further enhancement of their ionic conductivity is requested. In general, Li-ion conductivity is influenced by mechanisms operating on two distinct length scales. At the atomic scale, point defects and the associated migration barriers within the crystal lattice are critical, whereas at the micrometre scale, porosity and grain-boundary characteristics that develop during sintering become the dominant factors. These coupled effects make systematic optimization of conductivity difficult. In paticular, microstructural analysis has often relied on researchers' intuitive interpretation of scanning electron microscopy (SEM) images. Here, we apply a convolutional neural network (CNN), a deep-learning approach that has seen rapid advances in image analysis, to SEM images of LiZr<small><sub>2</sub></small>(PO<small><sub>4</sub></small>)<small><sub>3</sub></small>-based electrolytes. By combining image-derived features with conventional vector descriptors (composition, sintering parameters, <em>etc.</em>), our regression model achieved an <em>R</em><small><sup>2</sup></small> of 0.871. Furthermore, visual-interpretability analysis of the trained CNN revealed that grain-boundary regions were highlighted as low-conductivity areas. These findings demonstrate that deep-learning-based SEM analysis enables automated, quantitative evaluation of ionic conductivity and offers a powerful tool for accelerating the development of solid electrolyte materials.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 453-462"},"PeriodicalIF":6.2,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00232j?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Steven G. Arturo, Clyde Fare, Kaoru Aou, Dan Dermody, Will Edsall, Jillian Emerson, Kathryn Grzesiak, Arjita Kulshreshtha, Paul Mwasame, Edward O. Pyzer-Knapp and Jed Pitera
Phase diagrams of complex fluids are essential tools for understanding solubility and miscibility. Using a new objective function coupled with a constrained Bayesian optimization algorithm, we demonstrate the efficient location of phase boundaries in a sample two-phase ternary modeled using polymer self-consistent field theory, regularly seeing 50% fewer observations than an exhaustive search. Our approach is general, gradient-free, and can be applied to either simulation or experimental campaigns.
{"title":"Efficient simulation of complex fluid phase diagrams with Bayesian optimization","authors":"Steven G. Arturo, Clyde Fare, Kaoru Aou, Dan Dermody, Will Edsall, Jillian Emerson, Kathryn Grzesiak, Arjita Kulshreshtha, Paul Mwasame, Edward O. Pyzer-Knapp and Jed Pitera","doi":"10.1039/D5DD00150A","DOIUrl":"https://doi.org/10.1039/D5DD00150A","url":null,"abstract":"<p >Phase diagrams of complex fluids are essential tools for understanding solubility and miscibility. Using a new objective function coupled with a constrained Bayesian optimization algorithm, we demonstrate the efficient location of phase boundaries in a sample two-phase ternary modeled using polymer self-consistent field theory, regularly seeing 50% fewer observations than an exhaustive search. Our approach is general, gradient-free, and can be applied to either simulation or experimental campaigns.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 88-92"},"PeriodicalIF":6.2,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00150a?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shendong Tan, Bochun Liang, Dexin Lu, Chaoyuan Ji, Wenke Ji, Zihui Li and Tingzheng Hou
Solid polymer electrolytes exhibit limitations in room-temperature ionic conductivity and electrochemical stability. While molecular simulations and electronic-structure theory are able to sample these key properties at the molecular scale, the field currently lacks integrated, automated tools for end-to-end assessment. We introduce polymer electrolyte modeling and discovery (PEMD), an open-source Python framework that unifies polymer construction, force field parameterization, multiscale simulation, and property analysis for polymer electrolytes. The comprehensive analysis suite spans transport properties, transport mechanisms, and electrochemical stability. PEMD achieves a 100% success rate in constructing a collection of 656 homopolymers. The automated molecular dynamics workflow reproduces experimental ionic conductivities for 18 reported systems (Spearman ρ = 0.819; MAE = 0.684 in log 10 (S cm−1)). Specifically, for poly(ethylene oxide)/LiTFSI electrolytes, PEMD captures the canonical non-monotonic dependence of ionic conductivity on salt concentration with built-in default settings. The workflow is further applied at scale to compute ionic conductivities for 200 polymer electrolytes. Moreover, automated oxidation window screening on 15 representative polymer electrolytes recovers experimental rankings for the oxidation potential (Spearman ρ = 0.754; MAE = 0.473 V). With standardized protocols and traceable workflows, PEMD provides a reliable platform for high-throughput screening and data-driven design of solid polymer electrolytes.
固体聚合物电解质在室温离子电导率和电化学稳定性方面表现出局限性。虽然分子模拟和电子结构理论能够在分子尺度上对这些关键特性进行采样,但该领域目前缺乏集成的、自动化的端到端评估工具。我们介绍了聚合物电解质建模和发现(PEMD),这是一个开源的Python框架,它统一了聚合物构建,力场参数化,多尺度模拟和聚合物电解质的性质分析。综合分析套件涵盖传输特性,传输机制和电化学稳定性。在构建656个均聚物的过程中,PEMD实现了100%的成功率。自动化分子动力学工作流再现了18个已报道系统的实验离子电导率(Spearman ρ = 0.819; MAE = 0.684, log 10 (S cm−1))。具体来说,对于聚(环氧乙烷)/LiTFSI电解质,PEMD通过内置默认设置捕获离子电导率与盐浓度的典型非单调依赖关系。该工作流程进一步应用于计算200种聚合物电解质的离子电导率。此外,对15种代表性聚合物电解质的自动氧化窗口筛选恢复了氧化电位的实验排名(Spearman ρ = 0.754; MAE = 0.473 V)。通过标准化的方案和可追溯的工作流程,PEMD为固体聚合物电解质的高通量筛选和数据驱动设计提供了可靠的平台。
{"title":"PEMD: a high-throughput simulation and analysis framework for solid polymer electrolytes","authors":"Shendong Tan, Bochun Liang, Dexin Lu, Chaoyuan Ji, Wenke Ji, Zihui Li and Tingzheng Hou","doi":"10.1039/D5DD00454C","DOIUrl":"https://doi.org/10.1039/D5DD00454C","url":null,"abstract":"<p >Solid polymer electrolytes exhibit limitations in room-temperature ionic conductivity and electrochemical stability. While molecular simulations and electronic-structure theory are able to sample these key properties at the molecular scale, the field currently lacks integrated, automated tools for end-to-end assessment. We introduce polymer electrolyte modeling and discovery (PEMD), an open-source Python framework that unifies polymer construction, force field parameterization, multiscale simulation, and property analysis for polymer electrolytes. The comprehensive analysis suite spans transport properties, transport mechanisms, and electrochemical stability. PEMD achieves a 100% success rate in constructing a collection of 656 homopolymers. The automated molecular dynamics workflow reproduces experimental ionic conductivities for 18 reported systems (Spearman <em>ρ</em> = 0.819; MAE = 0.684 in log 10 (S cm<small><sup>−1</sup></small>)). Specifically, for poly(ethylene oxide)/LiTFSI electrolytes, PEMD captures the canonical non-monotonic dependence of ionic conductivity on salt concentration with built-in default settings. The workflow is further applied at scale to compute ionic conductivities for 200 polymer electrolytes. Moreover, automated oxidation window screening on 15 representative polymer electrolytes recovers experimental rankings for the oxidation potential (Spearman <em>ρ</em> = 0.754; MAE = 0.473 V). With standardized protocols and traceable workflows, PEMD provides a reliable platform for high-throughput screening and data-driven design of solid polymer electrolytes.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 193-202"},"PeriodicalIF":6.2,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00454c?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gabriel Greene-Diniz, Georgia Prokopiou, David Zsolt Manrique and David Muñoz Ramo
The ability to prepare states for quantum chemistry is a promising feature of quantum computers, and efficient techniques for chemical state preparation is an active area of research. In this paper, we implement and investigate two methods of quantum circuit preparation for multiconfigurational states for quantum chemical applications. It has previously been shown that controlled Givens rotations are universal for quantum chemistry. To prepare a selected linear combination of Slater determinants (represented as occupation number configurations) using Givens rotations, the gates that rotate between the reference and excited determinants need to be controlled on qubits outside the excitation (external controls), in general. We implement a method to automatically find the external controls required for utilizing Givens rotations to prepare multiconfigurational states on a quantum circuit. We compare this approach to an alternative technique that exploits the sparsity of the chemical state vector and find that the latter can outperform the method of externally controlled Givens rotations; highly reduced circuits can be obtained by taking advantage of the sparse nature (where the number of basis states is significantly less than 2nq for nq qubits) of chemical wavefunctions. We demonstrate the benefits of these techniques in a range of applications, including the ground states of a strongly correlated molecule, matrix elements of the Q-SCEOM algorithm for excited states, as well as correlated initial states for a quantum subspace method based on quantum computed moments and quantum phase estimation.
{"title":"Quantum state preparation of multiconfigurational states for quantum chemistry","authors":"Gabriel Greene-Diniz, Georgia Prokopiou, David Zsolt Manrique and David Muñoz Ramo","doi":"10.1039/D5DD00350D","DOIUrl":"https://doi.org/10.1039/D5DD00350D","url":null,"abstract":"<p >The ability to prepare states for quantum chemistry is a promising feature of quantum computers, and efficient techniques for chemical state preparation is an active area of research. In this paper, we implement and investigate two methods of quantum circuit preparation for multiconfigurational states for quantum chemical applications. It has previously been shown that controlled Givens rotations are universal for quantum chemistry. To prepare a selected linear combination of Slater determinants (represented as occupation number configurations) using Givens rotations, the gates that rotate between the reference and excited determinants need to be controlled on qubits outside the excitation (external controls), in general. We implement a method to automatically find the external controls required for utilizing Givens rotations to prepare multiconfigurational states on a quantum circuit. We compare this approach to an alternative technique that exploits the sparsity of the chemical state vector and find that the latter can outperform the method of externally controlled Givens rotations; highly reduced circuits can be obtained by taking advantage of the sparse nature (where the number of basis states is significantly less than 2<small><sup><em>n</em><small><sub><em>q</em></sub></small></sup></small> for <em>n</em><small><sub><em>q</em></sub></small> qubits) of chemical wavefunctions. We demonstrate the benefits of these techniques in a range of applications, including the ground states of a strongly correlated molecule, matrix elements of the Q-SCEOM algorithm for excited states, as well as correlated initial states for a quantum subspace method based on quantum computed moments and quantum phase estimation.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 134-152"},"PeriodicalIF":6.2,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00350d?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hossein Mashhadimoslem, Mohammad Ali Abdol, Kourosh Zanganeh, Ahmed Shafeen, Encheng Liu, Sohrab Zendehboudi, Ali Elkamel and Aiping Yu
This research focuses on efficiently collecting CO2 adsorption data using experimental metal–organic framework (MOF) porous materials from the scientific literature, addressing the challenges related to data classification and access to MOF synthesis methods. The aim is to organize, classify, and facilitate easy access to materials science information using artificial intelligence (AI). Using advanced large language models (LLMs), we developed a systematic approach to extract and sort MOF synthesis data for CO2 adsorption in a structured format. Using this method, we collected data from over 433 published experimental research papers and created a specific dataset to analyze the effects of metals, ligands, and carbon adsorption conditions on CO2 uptake performance. The correlations between the material structure, such as metal types, ligands, specific surface area, pore size, pore volume, synthesis conditions, and CO2 adsorption, under various process conditions were examined using the final database. We applied ChatGPT 4o mini as an AI assistant to text-mine all MOF information from different PDF file references. In addition to revealing the impact of each parameter on CO2 uptake and MOF structure before synthesis, the AI analysis findings indicated which ligand and metal groups should be altered to customize the MOF structure for improved CO2 capture.
{"title":"Toward smart CO2 capture by the synthesis of metal organic frameworks using large language models","authors":"Hossein Mashhadimoslem, Mohammad Ali Abdol, Kourosh Zanganeh, Ahmed Shafeen, Encheng Liu, Sohrab Zendehboudi, Ali Elkamel and Aiping Yu","doi":"10.1039/D5DD00446B","DOIUrl":"https://doi.org/10.1039/D5DD00446B","url":null,"abstract":"<p >This research focuses on efficiently collecting CO<small><sub>2</sub></small> adsorption data using experimental metal–organic framework (MOF) porous materials from the scientific literature, addressing the challenges related to data classification and access to MOF synthesis methods. The aim is to organize, classify, and facilitate easy access to materials science information using artificial intelligence (AI). Using advanced large language models (LLMs), we developed a systematic approach to extract and sort MOF synthesis data for CO<small><sub>2</sub></small> adsorption in a structured format. Using this method, we collected data from over 433 published experimental research papers and created a specific dataset to analyze the effects of metals, ligands, and carbon adsorption conditions on CO<small><sub>2</sub></small> uptake performance. The correlations between the material structure, such as metal types, ligands, specific surface area, pore size, pore volume, synthesis conditions, and CO<small><sub>2</sub></small> adsorption, under various process conditions were examined using the final database. We applied ChatGPT 4o mini as an AI assistant to text-mine all MOF information from different PDF file references. In addition to revealing the impact of each parameter on CO<small><sub>2</sub></small> uptake and MOF structure before synthesis, the AI analysis findings indicated which ligand and metal groups should be altered to customize the MOF structure for improved CO<small><sub>2</sub></small> capture.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 384-396"},"PeriodicalIF":6.2,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00446b?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aikaterini Vriza, Uma Kornu, Aditya Koneru, Henry Chan and Subramanian K. R. S. Sankaranarayanan
One of the main bottlenecks for the wide adoption of atomistic simulation pipelines for computational materials design is the high complexity of the workflows which many times requires the use of a diverse set of specialized toolkits and libraries. Here, we introduce a multi-agent artificial intelligence (AI) framework that autonomously performs end-to-end atomistic simulations, i.e. molecular dynamics (MD), with automated input and associated full suite of analyses, using large language models (LLMs) and multiple specialized AI agents. Our system orchestrates the entire simulation pipeline, from structure generation via Atomsk and interatomic potential discovery through automated web mining, to simulation setup and execution using LAMMPS on high-performance computing (HPC) platforms. Post-simulation, our agentic framework performs automated data analysis and visualization with popular analysis tools like OVITO and Phonopy. Each expert agent operates within a defined role, equipped with domain-specific functions and a shared memory context for coordination. Using a diverse set of representative elemental and alloy systems, we demonstrate the capability of our framework to execute a range of static and dynamic materials modeling tasks, including lattice parameter and cohesive energy estimation, elastic constants computation, phonon dispersion analysis, as well as perform MD simulations to determine dynamical properties that aid estimation of melting point. The results produced by the agents show strong agreement with those obtained by a human expert, highlighting the reliability of the agentic approach. By combining automation, reproducibility, and human-in-the-loop control, our framework lowers the barrier to the widespread adoption of scalable, AI-driven discovery tools in materials science.
{"title":"Multi-agentic AI framework for end-to-end atomistic simulations","authors":"Aikaterini Vriza, Uma Kornu, Aditya Koneru, Henry Chan and Subramanian K. R. S. Sankaranarayanan","doi":"10.1039/D5DD00435G","DOIUrl":"https://doi.org/10.1039/D5DD00435G","url":null,"abstract":"<p >One of the main bottlenecks for the wide adoption of atomistic simulation pipelines for computational materials design is the high complexity of the workflows which many times requires the use of a diverse set of specialized toolkits and libraries. Here, we introduce a multi-agent artificial intelligence (AI) framework that autonomously performs end-to-end atomistic simulations, <em>i.e.</em> molecular dynamics (MD), with automated input and associated full suite of analyses, using large language models (LLMs) and multiple specialized AI agents. Our system orchestrates the entire simulation pipeline, from structure generation <em>via</em> Atomsk and interatomic potential discovery through automated web mining, to simulation setup and execution using LAMMPS on high-performance computing (HPC) platforms. Post-simulation, our agentic framework performs automated data analysis and visualization with popular analysis tools like OVITO and Phonopy. Each expert agent operates within a defined role, equipped with domain-specific functions and a shared memory context for coordination. Using a diverse set of representative elemental and alloy systems, we demonstrate the capability of our framework to execute a range of static and dynamic materials modeling tasks, including lattice parameter and cohesive energy estimation, elastic constants computation, phonon dispersion analysis, as well as perform MD simulations to determine dynamical properties that aid estimation of melting point. The results produced by the agents show strong agreement with those obtained by a human expert, highlighting the reliability of the agentic approach. By combining automation, reproducibility, and human-in-the-loop control, our framework lowers the barrier to the widespread adoption of scalable, AI-driven discovery tools in materials science.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 440-452"},"PeriodicalIF":6.2,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00435g?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}