Digital discovery最新文献

Data augmentation in a triple transformer loop retrosynthesis model. 三变压器环路反合成模型中的数据增强。

IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY

Digital discovery

Pub Date : 2026-01-21 DOI: 10.1039/d5dd00465a

Yves Grandjean, David Kreutter, Jean-Louis Reymond

Reactions in the US Patent Office (USPTO) are biased towards a few over-represented reaction types, which potentially limits their usefulness for computer-assisted synthesis planning (CASP). To obtain an equilibrated dataset, we applied retrosynthesis templates to USPTO molecules as products (P) to generate starting materials (SM). We then used transformer T2 from our recently reported triple transformer loop (TTL) retrosynthesis model to predict reagents (R) for the SM → P reaction. Finally, we validated the prediction by requesting a high confidence prediction (>95%) for the prediction of P from SM + R by TTL transformer T3. We generated up to 5000 reactions per template, resulting in 27.5m validated fictive reactions covering the chemical space of the original USPTO dataset. To exemplify the use of this dataset, we demonstrate that a single-step retrosynthesis transformer model trained on a template equilibrated subset of 1 097 374 fictive reactions outperforms the corresponding model trained on USPTO reactions only.

美国专利局（USPTO）的反应倾向于一些代表性过度的反应类型，这可能限制了它们对计算机辅助合成计划（CASP）的有用性。为了获得一个平衡的数据集，我们将反合成模板应用于USPTO分子作为产物(P)来生成起始材料（SM）。然后，我们使用最近报道的三变压器回路（TTL）反合成模型中的变压器T2来预测SM→P反应的试剂(R)。最后，我们通过要求TTL变压器T3对SM + R的P预测的高置信度预测（>95%）来验证预测。我们在每个模板中生成了多达5000个反应，从而产生了275m个经过验证的有效反应，覆盖了原始USPTO数据集的化学空间。为了举例说明该数据集的使用，我们证明了在1 097 374个有效反应的模板平衡子集上训练的单步反合成变压器模型优于仅在USPTO反应上训练的相应模型。

引用次数: 0

Assessing the performance of quantum-mechanical descriptors in physicochemical and biological property prediction. 评估量子力学描述符在物理化学和生物特性预测中的性能。

IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY

Digital discovery

Pub Date : 2026-01-19 DOI: 10.1039/d5dd00411j

Alejandra Hinostroza Caldas, Artem Kokorin, Alexandre Tkatchenko, Leonardo Medrano Sandonas

Machine learning (ML) approaches have drastically advanced the exploration of structure-property and property-property relationships in computer-aided drug discovery. A central challenge in this field is the identification of molecular descriptors that can effectively capture both geometric- and electronic structure-derived features, enabling the development of reliable and interpretable predictive models. While numerous descriptors focusing solely on structural characteristics have been recently proposed, improvements in model accuracy often come at the cost of increased computational demands, thereby restricting their practical applicability. To address this challenge, we introduce the "QUantum Electronic Descriptor" (QUED) framework, which integrates both structural and electronic data of molecules to develop ML regression models for property prediction. In doing so, a quantum-mechanical (QM) descriptor is derived from molecular and atomic properties computed using the semi-empirical density functional tight-binding (DFTB) method, which allows for efficient modelling of both small and large drug-like molecules. This descriptor is combined with inexpensive geometric descriptors-capturing two-body and three-body interatomic interactions-to form comprehensive molecular representations used to train Kernel Ridge Regression and XGBoost models. As a proof of concept, we validate QUED using the QM7-X dataset, which comprises equilibrium and non-equilibrium conformations of small drug-like molecules, demonstrating that incorporating electronic structure data notably enhances the accuracy of ML models for predicting physicochemical properties. For biological endpoints, we find that QM properties provide some predictive value for toxicity and lipophilicity prediction, as assessed using the TDCommons-LD₅₀ and the MoleculeNet benchmark datasets. Moreover, a SHapley Additive exPlanations (SHAP) analysis of the toxicity and lipophilicity predictive models reveals that molecular orbital energies and DFTB energy components are among the most influential electronic features. Hence, our work underscores the importance of incorporating QM descriptors to enhance both the accuracy and interpretability of ML models for predicting multiple properties relevant to pharmaceutical and biological applications.

机器学习（ML）方法极大地推进了计算机辅助药物发现中结构-性质和属性-属性关系的探索。该领域的核心挑战是识别能够有效捕获几何和电子结构衍生特征的分子描述符，从而开发可靠且可解释的预测模型。虽然最近提出了许多只关注结构特征的描述符，但模型精度的提高往往是以增加计算需求为代价的，从而限制了它们的实际适用性。为了解决这一挑战，我们引入了“量子电子描述符”（QUED）框架，该框架集成了分子的结构和电子数据，以开发用于属性预测的ML回归模型。在这样做的过程中，量子力学（QM）描述符是从使用半经验密度功能紧密结合（DFTB）方法计算的分子和原子性质中衍生出来的，该方法允许有效地模拟小型和大型药物类分子。该描述符与廉价的几何描述符（捕获两体和三体原子间相互作用）相结合，形成用于训练Kernel Ridge Regression和XGBoost模型的综合分子表示。作为概念验证，我们使用QM7-X数据集验证了QUED，该数据集包含小药物样分子的平衡和非平衡构象，表明结合电子结构数据显着提高了ML模型预测物理化学性质的准确性。对于生物学终点，我们发现QM特性对毒性和亲脂性预测提供了一定的预测价值，如使用TDCommons-LD50和MoleculeNet基准数据集进行评估。此外，对毒性和亲脂性预测模型的SHapley加性解释（SHAP）分析表明，分子轨道能量和DFTB能量成分是影响最大的电子特征。因此，我们的工作强调了结合QM描述符来提高ML模型预测与制药和生物应用相关的多种特性的准确性和可解释性的重要性。

{"title":"Assessing the performance of quantum-mechanical descriptors in physicochemical and biological property prediction.","authors":"Alejandra Hinostroza Caldas, Artem Kokorin, Alexandre Tkatchenko, Leonardo Medrano Sandonas","doi":"10.1039/d5dd00411j","DOIUrl":"10.1039/d5dd00411j","url":null,"abstract":"Machine learning (ML) approaches have drastically advanced the exploration of structure-property and property-property relationships in computer-aided drug discovery. A central challenge in this field is the identification of molecular descriptors that can effectively capture both geometric- and electronic structure-derived features, enabling the development of reliable and interpretable predictive models. While numerous descriptors focusing solely on structural characteristics have been recently proposed, improvements in model accuracy often come at the cost of increased computational demands, thereby restricting their practical applicability. To address this challenge, we introduce the \"QUantum Electronic Descriptor\" (QUED) framework, which integrates both structural and electronic data of molecules to develop ML regression models for property prediction. In doing so, a quantum-mechanical (QM) descriptor is derived from molecular and atomic properties computed using the semi-empirical density functional tight-binding (DFTB) method, which allows for efficient modelling of both small and large drug-like molecules. This descriptor is combined with inexpensive geometric descriptors-capturing two-body and three-body interatomic interactions-to form comprehensive molecular representations used to train Kernel Ridge Regression and XGBoost models. As a proof of concept, we validate QUED using the QM7-X dataset, which comprises equilibrium and non-equilibrium conformations of small drug-like molecules, demonstrating that incorporating electronic structure data notably enhances the accuracy of ML models for predicting physicochemical properties. For biological endpoints, we find that QM properties provide some predictive value for toxicity and lipophilicity prediction, as assessed using the TDCommons-LD50 and the MoleculeNet benchmark datasets. Moreover, a SHapley Additive exPlanations (SHAP) analysis of the toxicity and lipophilicity predictive models reveals that molecular orbital energies and DFTB energy components are among the most influential electronic features. Hence, our work underscores the importance of incorporating QM descriptors to enhance both the accuracy and interpretability of ML models for predicting multiple properties relevant to pharmaceutical and biological applications.","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" ","pages":""},"PeriodicalIF":6.2,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12820757/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146031825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Correction: Advancing mutagenicity predictions in drug discovery with an explainable few-shot deep learning framework 更正：通过可解释的少量深度学习框架推进药物发现中的突变性预测

IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY

Digital discovery

Pub Date : 2025-12-17 DOI: 10.1039/D5DD90058A

Luis H. M. Torres, Sofia M. da Silva, Joel P. Arrais, Catarina Pimentel and Bernardete Ribeiro

Correction for ‘Advancing mutagenicity predictions in drug discovery with an explainable few-shot deep learning framework’ by Luis H. M. Torres et al., Digital Discovery, 2025, 4, 3515–3532, https://doi.org/10.1039/D5DD00276A.

更正Luis H. M. Torres等人的“利用可解释的少量深度学习框架推进药物发现中的突变性预测”，《数字发现》，2025,4,3515 - 3532,https://doi.org/10.1039/D5DD00276A。

引用次数: 0

One step retrosynthesis of drugs from commercially available chemical building blocks and conceivable coupling reactions 一步反合成药物从商业上可用的化学构建模块和可能的偶联反应

IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY

Digital discovery

Pub Date : 2025-12-16 DOI: 10.1039/D5DD00310E

Babak Mahjour, Felix Katzenburg, Emil Lammi and Tim Cernak

In this report, the pharmaceuticals listed in DrugBank were structurally mapped to a commercial catalog of chemical feedstocks through reaction agnostic one step retrosynthetic decomposition. Enumerative combinatorics was utilized to retrosynthesize target molecules into commercially available building blocks, wherein only the bond formed and the minimal substructure template of each building block class are considered. In contrast to the status quo in automated retrosynthesis, our algorithm may suggest reactions that do not yet exist but, if they did, could enable the synthesis of drugs in just one reaction step from commercial feedstocks. Cross-referencing synthons to commercial datasets can thus reveal valuable reaction classes for development in addition to streamlining drug production. Decomposed synthons were linked to target molecules by transformations that form one bond after the elimination of each synthon's respective reactive functional handle, as indicated by their building block class. Specific reactivities were analyzed after post hoc refinement and clustering of commercial synthons. Maps between boronates, bromides, iodides, amines, acids, chlorides, alcohols, and various C–H motifs to form alkyl–alkyl, alkyl–aryl, and aryl–aryl carbon–carbon, carbon–nitrogen, and carbon–oxygen bonds are reported herein, with specific examples for each provided.

在本报告中，通过反应不可知的一步反合成分解，将药物库中列出的药物结构映射到化学原料的商业目录中。利用枚举组合法将目标分子反合成为商业上可用的构建块，其中仅考虑形成的键和每个构建块类的最小子结构模板。与自动化反合成的现状相反，我们的算法可能会提出尚不存在的反应，但如果它们存在，则可以从商业原料中只需一个反应步骤就可以合成药物。因此，除了简化药物生产之外，将synthons与商业数据集交叉引用可以揭示有价值的反应类，以供开发。分解的synthons通过转换与目标分子连接，在消除每个synthons各自的反应性功能句柄后形成一个键，如它们的构建块类所示。具体的反应性分析后，特设细化和聚类的商业synons。本文报告了硼酸盐、溴化物、碘化物、胺、酸、氯化物、醇和各种C-H基序之间的映射，以形成烷基-烷基、烷基-芳基和芳基-芳基碳-碳、碳-氮和碳-氧键，并提供了每种键的具体示例。

{"title":"One step retrosynthesis of drugs from commercially available chemical building blocks and conceivable coupling reactions","authors":"Babak Mahjour, Felix Katzenburg, Emil Lammi and Tim Cernak","doi":"10.1039/D5DD00310E","DOIUrl":"https://doi.org/10.1039/D5DD00310E","url":null,"abstract":"In this report, the pharmaceuticals listed in DrugBank were structurally mapped to a commercial catalog of chemical feedstocks through reaction agnostic one step retrosynthetic decomposition. Enumerative combinatorics was utilized to retrosynthesize target molecules into commercially available building blocks, wherein only the bond formed and the minimal substructure template of each building block class are considered. In contrast to the status quo in automated retrosynthesis, our algorithm may suggest reactions that do not yet exist but, if they did, could enable the synthesis of drugs in just one reaction step from commercial feedstocks. Cross-referencing synthons to commercial datasets can thus reveal valuable reaction classes for development in addition to streamlining drug production. Decomposed synthons were linked to target molecules by transformations that form one bond after the elimination of each synthon's respective reactive functional handle, as indicated by their building block class. Specific reactivities were analyzed after post hoc refinement and clustering of commercial synthons. Maps between boronates, bromides, iodides, amines, acids, chlorides, alcohols, and various C–H motifs to form alkyl–alkyl, alkyl–aryl, and aryl–aryl carbon–carbon, carbon–nitrogen, and carbon–oxygen bonds are reported herein, with specific examples for each provided.","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 153-160"},"PeriodicalIF":6.2,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00310e?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deep learning based SEM image analysis for predicting ionic conductivity in LiZr2(PO4)3-based solid electrolytes 基于深度学习的SEM图像分析预测lizzr2 (PO4)3基固体电解质的离子电导率

IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY

Digital discovery

Pub Date : 2025-12-15 DOI: 10.1039/D5DD00232J

Kento Murakami, Yudai Yamaguchi, Yo Kato, Kazuki Ishikawa, Naoto Tanibata, Hayami Takeda, Masanobu Nakayama and Masayuki Karasuyama

Lithium-ion-conductive oxide materials have attracted considerable attention as solid electrolytes for all-solid-state batteries. In particular, LiZr₂(PO₄)₃-related compounds are promising for high-energy-density devices using metallic lithium anodes, but further enhancement of their ionic conductivity is requested. In general, Li-ion conductivity is influenced by mechanisms operating on two distinct length scales. At the atomic scale, point defects and the associated migration barriers within the crystal lattice are critical, whereas at the micrometre scale, porosity and grain-boundary characteristics that develop during sintering become the dominant factors. These coupled effects make systematic optimization of conductivity difficult. In paticular, microstructural analysis has often relied on researchers' intuitive interpretation of scanning electron microscopy (SEM) images. Here, we apply a convolutional neural network (CNN), a deep-learning approach that has seen rapid advances in image analysis, to SEM images of LiZr₂(PO₄)₃-based electrolytes. By combining image-derived features with conventional vector descriptors (composition, sintering parameters, etc.), our regression model achieved an R² of 0.871. Furthermore, visual-interpretability analysis of the trained CNN revealed that grain-boundary regions were highlighted as low-conductivity areas. These findings demonstrate that deep-learning-based SEM analysis enables automated, quantitative evaluation of ionic conductivity and offers a powerful tool for accelerating the development of solid electrolyte materials.

作为全固态电池的固体电解质，锂离子导电氧化物材料受到了广泛的关注。特别是，lizzr2 (PO4)3相关化合物有望用于金属锂阳极的高能量密度器件，但需要进一步提高其离子电导率。一般来说，锂离子电导率受到两个不同长度尺度上的机制的影响。在原子尺度上，点缺陷和晶格内相关的迁移障碍是关键，而在微米尺度上，烧结过程中形成的孔隙率和晶界特征成为主导因素。这些耦合效应使得电导率的系统优化变得困难。特别是，微观结构分析往往依赖于研究人员对扫描电子显微镜（SEM）图像的直观解释。在这里，我们将卷积神经网络（CNN）应用于基于LiZr2(PO4)3的电解质的SEM图像，这是一种深度学习方法，在图像分析方面取得了快速进展。通过将图像衍生特征与传统向量描述符（成分、烧结参数等）相结合，我们的回归模型的R2为0.871。此外，训练后的CNN的视觉可解释性分析显示，晶界区域被突出显示为低电导率区域。这些发现表明，基于深度学习的SEM分析能够实现离子电导率的自动化、定量评估，并为加速固体电解质材料的开发提供了有力的工具。

{"title":"Deep learning based SEM image analysis for predicting ionic conductivity in LiZr2(PO4)3-based solid electrolytes","authors":"Kento Murakami, Yudai Yamaguchi, Yo Kato, Kazuki Ishikawa, Naoto Tanibata, Hayami Takeda, Masanobu Nakayama and Masayuki Karasuyama","doi":"10.1039/D5DD00232J","DOIUrl":"https://doi.org/10.1039/D5DD00232J","url":null,"abstract":"Lithium-ion-conductive oxide materials have attracted considerable attention as solid electrolytes for all-solid-state batteries. In particular, LiZr2(PO4)3-related compounds are promising for high-energy-density devices using metallic lithium anodes, but further enhancement of their ionic conductivity is requested. In general, Li-ion conductivity is influenced by mechanisms operating on two distinct length scales. At the atomic scale, point defects and the associated migration barriers within the crystal lattice are critical, whereas at the micrometre scale, porosity and grain-boundary characteristics that develop during sintering become the dominant factors. These coupled effects make systematic optimization of conductivity difficult. In paticular, microstructural analysis has often relied on researchers' intuitive interpretation of scanning electron microscopy (SEM) images. Here, we apply a convolutional neural network (CNN), a deep-learning approach that has seen rapid advances in image analysis, to SEM images of LiZr2(PO4)3-based electrolytes. By combining image-derived features with conventional vector descriptors (composition, sintering parameters, etc.), our regression model achieved an R2 of 0.871. Furthermore, visual-interpretability analysis of the trained CNN revealed that grain-boundary regions were highlighted as low-conductivity areas. These findings demonstrate that deep-learning-based SEM analysis enables automated, quantitative evaluation of ionic conductivity and offers a powerful tool for accelerating the development of solid electrolyte materials.","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 453-462"},"PeriodicalIF":6.2,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00232j?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient simulation of complex fluid phase diagrams with Bayesian optimization 基于贝叶斯优化的复杂流体相图高效模拟

IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY

Digital discovery

Pub Date : 2025-12-15 DOI: 10.1039/D5DD00150A

Steven G. Arturo, Clyde Fare, Kaoru Aou, Dan Dermody, Will Edsall, Jillian Emerson, Kathryn Grzesiak, Arjita Kulshreshtha, Paul Mwasame, Edward O. Pyzer-Knapp and Jed Pitera

Phase diagrams of complex fluids are essential tools for understanding solubility and miscibility. Using a new objective function coupled with a constrained Bayesian optimization algorithm, we demonstrate the efficient location of phase boundaries in a sample two-phase ternary modeled using polymer self-consistent field theory, regularly seeing 50% fewer observations than an exhaustive search. Our approach is general, gradient-free, and can be applied to either simulation or experimental campaigns.

复杂流体的相图是了解溶解度和混相性的重要工具。利用新的目标函数和约束贝叶斯优化算法，我们展示了在聚合物自一致场理论建模的样品两相三元体系中相边界的有效定位，通常比穷举搜索少50%的观测值。我们的方法是通用的，无梯度的，可以应用于模拟或实验活动。

引用次数: 0

PEMD: a high-throughput simulation and analysis framework for solid polymer electrolytes 固体聚合物电解质的高通量模拟和分析框架

IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY

Digital discovery

Pub Date : 2025-12-12 DOI: 10.1039/D5DD00454C

Shendong Tan, Bochun Liang, Dexin Lu, Chaoyuan Ji, Wenke Ji, Zihui Li and Tingzheng Hou

Solid polymer electrolytes exhibit limitations in room-temperature ionic conductivity and electrochemical stability. While molecular simulations and electronic-structure theory are able to sample these key properties at the molecular scale, the field currently lacks integrated, automated tools for end-to-end assessment. We introduce polymer electrolyte modeling and discovery (PEMD), an open-source Python framework that unifies polymer construction, force field parameterization, multiscale simulation, and property analysis for polymer electrolytes. The comprehensive analysis suite spans transport properties, transport mechanisms, and electrochemical stability. PEMD achieves a 100% success rate in constructing a collection of 656 homopolymers. The automated molecular dynamics workflow reproduces experimental ionic conductivities for 18 reported systems (Spearman ρ = 0.819; MAE = 0.684 in log 10 (S cm⁻¹)). Specifically, for poly(ethylene oxide)/LiTFSI electrolytes, PEMD captures the canonical non-monotonic dependence of ionic conductivity on salt concentration with built-in default settings. The workflow is further applied at scale to compute ionic conductivities for 200 polymer electrolytes. Moreover, automated oxidation window screening on 15 representative polymer electrolytes recovers experimental rankings for the oxidation potential (Spearman ρ = 0.754; MAE = 0.473 V). With standardized protocols and traceable workflows, PEMD provides a reliable platform for high-throughput screening and data-driven design of solid polymer electrolytes.

固体聚合物电解质在室温离子电导率和电化学稳定性方面表现出局限性。虽然分子模拟和电子结构理论能够在分子尺度上对这些关键特性进行采样，但该领域目前缺乏集成的、自动化的端到端评估工具。我们介绍了聚合物电解质建模和发现（PEMD），这是一个开源的Python框架，它统一了聚合物构建，力场参数化，多尺度模拟和聚合物电解质的性质分析。综合分析套件涵盖传输特性，传输机制和电化学稳定性。在构建656个均聚物的过程中，PEMD实现了100%的成功率。自动化分子动力学工作流再现了18个已报道系统的实验离子电导率（Spearman ρ = 0.819; MAE = 0.684, log 10 (S cm−1)）。具体来说，对于聚（环氧乙烷）/LiTFSI电解质，PEMD通过内置默认设置捕获离子电导率与盐浓度的典型非单调依赖关系。该工作流程进一步应用于计算200种聚合物电解质的离子电导率。此外，对15种代表性聚合物电解质的自动氧化窗口筛选恢复了氧化电位的实验排名（Spearman ρ = 0.754; MAE = 0.473 V）。通过标准化的方案和可追溯的工作流程，PEMD为固体聚合物电解质的高通量筛选和数据驱动设计提供了可靠的平台。

{"title":"PEMD: a high-throughput simulation and analysis framework for solid polymer electrolytes","authors":"Shendong Tan, Bochun Liang, Dexin Lu, Chaoyuan Ji, Wenke Ji, Zihui Li and Tingzheng Hou","doi":"10.1039/D5DD00454C","DOIUrl":"https://doi.org/10.1039/D5DD00454C","url":null,"abstract":"Solid polymer electrolytes exhibit limitations in room-temperature ionic conductivity and electrochemical stability. While molecular simulations and electronic-structure theory are able to sample these key properties at the molecular scale, the field currently lacks integrated, automated tools for end-to-end assessment. We introduce polymer electrolyte modeling and discovery (PEMD), an open-source Python framework that unifies polymer construction, force field parameterization, multiscale simulation, and property analysis for polymer electrolytes. The comprehensive analysis suite spans transport properties, transport mechanisms, and electrochemical stability. PEMD achieves a 100% success rate in constructing a collection of 656 homopolymers. The automated molecular dynamics workflow reproduces experimental ionic conductivities for 18 reported systems (Spearman ρ = 0.819; MAE = 0.684 in log 10 (S cm−1)). Specifically, for poly(ethylene oxide)/LiTFSI electrolytes, PEMD captures the canonical non-monotonic dependence of ionic conductivity on salt concentration with built-in default settings. The workflow is further applied at scale to compute ionic conductivities for 200 polymer electrolytes. Moreover, automated oxidation window screening on 15 representative polymer electrolytes recovers experimental rankings for the oxidation potential (Spearman ρ = 0.754; MAE = 0.473 V). With standardized protocols and traceable workflows, PEMD provides a reliable platform for high-throughput screening and data-driven design of solid polymer electrolytes.","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 193-202"},"PeriodicalIF":6.2,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00454c?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Quantum state preparation of multiconfigurational states for quantum chemistry 量子化学中多构型态的制备

IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY

Digital discovery

Pub Date : 2025-12-12 DOI: 10.1039/D5DD00350D

Gabriel Greene-Diniz, Georgia Prokopiou, David Zsolt Manrique and David Muñoz Ramo

The ability to prepare states for quantum chemistry is a promising feature of quantum computers, and efficient techniques for chemical state preparation is an active area of research. In this paper, we implement and investigate two methods of quantum circuit preparation for multiconfigurational states for quantum chemical applications. It has previously been shown that controlled Givens rotations are universal for quantum chemistry. To prepare a selected linear combination of Slater determinants (represented as occupation number configurations) using Givens rotations, the gates that rotate between the reference and excited determinants need to be controlled on qubits outside the excitation (external controls), in general. We implement a method to automatically find the external controls required for utilizing Givens rotations to prepare multiconfigurational states on a quantum circuit. We compare this approach to an alternative technique that exploits the sparsity of the chemical state vector and find that the latter can outperform the method of externally controlled Givens rotations; highly reduced circuits can be obtained by taking advantage of the sparse nature (where the number of basis states is significantly less than 2^n_q for n_q qubits) of chemical wavefunctions. We demonstrate the benefits of these techniques in a range of applications, including the ground states of a strongly correlated molecule, matrix elements of the Q-SCEOM algorithm for excited states, as well as correlated initial states for a quantum subspace method based on quantum computed moments and quantum phase estimation.

制备量子化学状态的能力是量子计算机的一个很有前途的特征，有效的化学状态制备技术是一个活跃的研究领域。在本文中，我们实现和研究了两种用于量子化学应用的多构型态量子电路制备方法。先前已经证明受控的给定旋转在量子化学中是普遍存在的。为了使用Givens旋转准备Slater行列式（表示为职业数配置）的选择线性组合，通常需要在激发（外部控制）之外的量子位上控制在参考和激发行列式之间旋转的门。我们实现了一种方法来自动找到利用给定旋转在量子电路上制备多组态所需的外部控制。我们将这种方法与利用化学状态向量的稀疏性的替代技术进行比较，发现后者可以优于外部控制的给定旋转方法；利用化学波函数的稀疏特性（对于nq量子比特，基态的数量明显小于2nq），可以获得高度简化的电路。我们展示了这些技术在一系列应用中的好处，包括强相关分子的基态，激发态Q-SCEOM算法的矩阵元素，以及基于量子计算矩和量子相位估计的量子子空间方法的相关初始态。

{"title":"Quantum state preparation of multiconfigurational states for quantum chemistry","authors":"Gabriel Greene-Diniz, Georgia Prokopiou, David Zsolt Manrique and David Muñoz Ramo","doi":"10.1039/D5DD00350D","DOIUrl":"https://doi.org/10.1039/D5DD00350D","url":null,"abstract":"The ability to prepare states for quantum chemistry is a promising feature of quantum computers, and efficient techniques for chemical state preparation is an active area of research. In this paper, we implement and investigate two methods of quantum circuit preparation for multiconfigurational states for quantum chemical applications. It has previously been shown that controlled Givens rotations are universal for quantum chemistry. To prepare a selected linear combination of Slater determinants (represented as occupation number configurations) using Givens rotations, the gates that rotate between the reference and excited determinants need to be controlled on qubits outside the excitation (external controls), in general. We implement a method to automatically find the external controls required for utilizing Givens rotations to prepare multiconfigurational states on a quantum circuit. We compare this approach to an alternative technique that exploits the sparsity of the chemical state vector and find that the latter can outperform the method of externally controlled Givens rotations; highly reduced circuits can be obtained by taking advantage of the sparse nature (where the number of basis states is significantly less than 2nq for nq qubits) of chemical wavefunctions. We demonstrate the benefits of these techniques in a range of applications, including the ground states of a strongly correlated molecule, matrix elements of the Q-SCEOM algorithm for excited states, as well as correlated initial states for a quantum subspace method based on quantum computed moments and quantum phase estimation.","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 134-152"},"PeriodicalIF":6.2,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00350d?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Toward smart CO2 capture by the synthesis of metal organic frameworks using large language models 利用大型语言模型合成金属有机框架，实现智能二氧化碳捕获

IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY

Digital discovery

Pub Date : 2025-12-11 DOI: 10.1039/D5DD00446B

Hossein Mashhadimoslem, Mohammad Ali Abdol, Kourosh Zanganeh, Ahmed Shafeen, Encheng Liu, Sohrab Zendehboudi, Ali Elkamel and Aiping Yu

This research focuses on efficiently collecting CO₂ adsorption data using experimental metal–organic framework (MOF) porous materials from the scientific literature, addressing the challenges related to data classification and access to MOF synthesis methods. The aim is to organize, classify, and facilitate easy access to materials science information using artificial intelligence (AI). Using advanced large language models (LLMs), we developed a systematic approach to extract and sort MOF synthesis data for CO₂ adsorption in a structured format. Using this method, we collected data from over 433 published experimental research papers and created a specific dataset to analyze the effects of metals, ligands, and carbon adsorption conditions on CO₂ uptake performance. The correlations between the material structure, such as metal types, ligands, specific surface area, pore size, pore volume, synthesis conditions, and CO₂ adsorption, under various process conditions were examined using the final database. We applied ChatGPT 4o mini as an AI assistant to text-mine all MOF information from different PDF file references. In addition to revealing the impact of each parameter on CO₂ uptake and MOF structure before synthesis, the AI analysis findings indicated which ligand and metal groups should be altered to customize the MOF structure for improved CO₂ capture.

本研究的重点是利用实验金属有机框架（MOF）多孔材料从科学文献中高效收集二氧化碳吸附数据，解决与数据分类和获取MOF合成方法相关的挑战。其目的是利用人工智能（AI）组织、分类和方便地访问材料科学信息。利用先进的大型语言模型（LLMs），我们开发了一种系统的方法，以结构化的格式提取和分类二氧化碳吸附的MOF合成数据。利用该方法，我们收集了超过433篇已发表的实验研究论文的数据，并创建了一个特定的数据集来分析金属、配体和碳吸附条件对CO2吸收性能的影响。利用最终数据库考察了不同工艺条件下材料结构（如金属类型、配体、比表面积、孔径、孔体积、合成条件和CO2吸附）之间的相关性。我们应用ChatGPT 40mini作为人工智能助手，从不同的PDF文件引用中挖掘所有MOF信息。除了揭示合成前每个参数对CO2吸收和MOF结构的影响外，AI分析结果还表明，应该改变哪些配体和金属基团来定制MOF结构以改善CO2捕获。

{"title":"Toward smart CO2 capture by the synthesis of metal organic frameworks using large language models","authors":"Hossein Mashhadimoslem, Mohammad Ali Abdol, Kourosh Zanganeh, Ahmed Shafeen, Encheng Liu, Sohrab Zendehboudi, Ali Elkamel and Aiping Yu","doi":"10.1039/D5DD00446B","DOIUrl":"https://doi.org/10.1039/D5DD00446B","url":null,"abstract":"This research focuses on efficiently collecting CO2 adsorption data using experimental metal–organic framework (MOF) porous materials from the scientific literature, addressing the challenges related to data classification and access to MOF synthesis methods. The aim is to organize, classify, and facilitate easy access to materials science information using artificial intelligence (AI). Using advanced large language models (LLMs), we developed a systematic approach to extract and sort MOF synthesis data for CO2 adsorption in a structured format. Using this method, we collected data from over 433 published experimental research papers and created a specific dataset to analyze the effects of metals, ligands, and carbon adsorption conditions on CO2 uptake performance. The correlations between the material structure, such as metal types, ligands, specific surface area, pore size, pore volume, synthesis conditions, and CO2 adsorption, under various process conditions were examined using the final database. We applied ChatGPT 4o mini as an AI assistant to text-mine all MOF information from different PDF file references. In addition to revealing the impact of each parameter on CO2 uptake and MOF structure before synthesis, the AI analysis findings indicated which ligand and metal groups should be altered to customize the MOF structure for improved CO2 capture.","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 384-396"},"PeriodicalIF":6.2,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00446b?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-agentic AI framework for end-to-end atomistic simulations 端到端原子模拟的多代理AI框架

IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY

Digital discovery

Pub Date : 2025-12-09 DOI: 10.1039/D5DD00435G

Aikaterini Vriza, Uma Kornu, Aditya Koneru, Henry Chan and Subramanian K. R. S. Sankaranarayanan

One of the main bottlenecks for the wide adoption of atomistic simulation pipelines for computational materials design is the high complexity of the workflows which many times requires the use of a diverse set of specialized toolkits and libraries. Here, we introduce a multi-agent artificial intelligence (AI) framework that autonomously performs end-to-end atomistic simulations, i.e. molecular dynamics (MD), with automated input and associated full suite of analyses, using large language models (LLMs) and multiple specialized AI agents. Our system orchestrates the entire simulation pipeline, from structure generation via Atomsk and interatomic potential discovery through automated web mining, to simulation setup and execution using LAMMPS on high-performance computing (HPC) platforms. Post-simulation, our agentic framework performs automated data analysis and visualization with popular analysis tools like OVITO and Phonopy. Each expert agent operates within a defined role, equipped with domain-specific functions and a shared memory context for coordination. Using a diverse set of representative elemental and alloy systems, we demonstrate the capability of our framework to execute a range of static and dynamic materials modeling tasks, including lattice parameter and cohesive energy estimation, elastic constants computation, phonon dispersion analysis, as well as perform MD simulations to determine dynamical properties that aid estimation of melting point. The results produced by the agents show strong agreement with those obtained by a human expert, highlighting the reliability of the agentic approach. By combining automation, reproducibility, and human-in-the-loop control, our framework lowers the barrier to the widespread adoption of scalable, AI-driven discovery tools in materials science.

在计算材料设计中广泛采用原子模拟管道的主要瓶颈之一是工作流程的高度复杂性，这常常需要使用各种专门的工具包和库。在这里，我们引入了一个多代理人工智能（AI）框架，该框架自主执行端到端原子模拟，即分子动力学（MD），使用大型语言模型（llm）和多个专门的AI代理，自动输入和相关的全套分析。我们的系统编排了整个模拟管道，从通过Atomsk生成结构和通过自动网络挖掘发现原子间电位，到在高性能计算（HPC）平台上使用LAMMPS进行模拟设置和执行。模拟后，我们的代理框架使用流行的分析工具（如OVITO和Phonopy）执行自动数据分析和可视化。每个专家代理在一个定义的角色中操作，配备了特定于领域的功能和用于协调的共享内存上下文。使用一组不同的代表性元素和合金系统，我们展示了我们的框架执行一系列静态和动态材料建模任务的能力，包括晶格参数和内聚能估计，弹性常数计算，声子色散分析，以及执行MD模拟来确定有助于熔点估计的动态特性。代理产生的结果与人类专家获得的结果非常一致，突出了代理方法的可靠性。通过结合自动化、可重复性和人在环控制，我们的框架降低了在材料科学中广泛采用可扩展的、人工智能驱动的发现工具的障碍。

{"title":"Multi-agentic AI framework for end-to-end atomistic simulations","authors":"Aikaterini Vriza, Uma Kornu, Aditya Koneru, Henry Chan and Subramanian K. R. S. Sankaranarayanan","doi":"10.1039/D5DD00435G","DOIUrl":"https://doi.org/10.1039/D5DD00435G","url":null,"abstract":"One of the main bottlenecks for the wide adoption of atomistic simulation pipelines for computational materials design is the high complexity of the workflows which many times requires the use of a diverse set of specialized toolkits and libraries. Here, we introduce a multi-agent artificial intelligence (AI) framework that autonomously performs end-to-end atomistic simulations, i.e. molecular dynamics (MD), with automated input and associated full suite of analyses, using large language models (LLMs) and multiple specialized AI agents. Our system orchestrates the entire simulation pipeline, from structure generation via Atomsk and interatomic potential discovery through automated web mining, to simulation setup and execution using LAMMPS on high-performance computing (HPC) platforms. Post-simulation, our agentic framework performs automated data analysis and visualization with popular analysis tools like OVITO and Phonopy. Each expert agent operates within a defined role, equipped with domain-specific functions and a shared memory context for coordination. Using a diverse set of representative elemental and alloy systems, we demonstrate the capability of our framework to execute a range of static and dynamic materials modeling tasks, including lattice parameter and cohesive energy estimation, elastic constants computation, phonon dispersion analysis, as well as perform MD simulations to determine dynamical properties that aid estimation of melting point. The results produced by the agents show strong agreement with those obtained by a human expert, highlighting the reliability of the agentic approach. By combining automation, reproducibility, and human-in-the-loop control, our framework lowers the barrier to the widespread adoption of scalable, AI-driven discovery tools in materials science.","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 440-452"},"PeriodicalIF":6.2,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00435g?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0