首页 > 最新文献

Digital discovery最新文献

英文 中文
An improved machine learning strategy using structural features to predict the glass transition temperature of oxide glasses 一种利用结构特征预测氧化玻璃玻璃化转变温度的改进机器学习策略
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2025-10-24 DOI: 10.1039/D5DD00326A
Satwinder Singh Danewalia and Kulvir Singh

We present a physics-informed machine learning approach to predict the glass transition temperature (Tg) of sodium borosilicate glasses. Four models—random forest, extreme gradient boosting, support vector machines, and K-nearest neighbors—were trained using both compositional and structural features derived from statistical mechanics. Incorporating these structural descriptors significantly improved model performance. This is evident from reduction in mean absolute error (14.85 K → 13.76 K), root mean square error (21.78 → 19.12) and increase in R2 (0.88 → 0.91) measured on testing the dataset for the random forest model. Similar performance improvement was seen for other models as well. Building on this, we propose a three-step predictive strategy that enhances generalization across compositions and accurately predict the Tg of unseen compositions, achieving a mean absolute error of approximately 8 K and an R2 value of around 0.98. Our method demonstrates improved accuracy when benchmarked against GlassNet, which represents the current state-of-the-art in property prediction for glasses. These results highlight the importance of considering structural information in improving prediction capabilities of machine learning models for composition-specific small datasets. This approach can assist in the rapid screening and design of glass materials, reducing the reliance on time-consuming experiments and guiding future research toward targeted property optimization.

我们提出了一种基于物理的机器学习方法来预测硼硅酸钠玻璃的玻璃化转变温度(Tg)。四个模型——随机森林、极端梯度增强、支持向量机和k近邻——使用来自统计力学的组成和结构特征进行训练。结合这些结构描述符显著提高了模型的性能。这可以从平均绝对误差(14.85 K→13.76 K)、均方根误差(21.78→19.12)的减小和随机森林模型数据集测试中测量到的R2(0.88→0.91)的增加中看出。其他模型也看到了类似的性能改进。在此基础上,我们提出了一个三步预测策略,该策略增强了跨成分的泛化,并准确地预测了未见成分的Tg,实现了平均绝对误差约为8 K, R2值约为0.98。我们的方法在与GlassNet进行基准测试时证明了更高的准确性,GlassNet代表了当前最先进的玻璃属性预测。这些结果强调了考虑结构信息在提高机器学习模型对特定于成分的小数据集的预测能力方面的重要性。这种方法可以帮助玻璃材料的快速筛选和设计,减少对耗时实验的依赖,并指导未来的针对性性能优化研究。
{"title":"An improved machine learning strategy using structural features to predict the glass transition temperature of oxide glasses","authors":"Satwinder Singh Danewalia and Kulvir Singh","doi":"10.1039/D5DD00326A","DOIUrl":"https://doi.org/10.1039/D5DD00326A","url":null,"abstract":"<p >We present a physics-informed machine learning approach to predict the glass transition temperature (<em>T</em><small><sub><em>g</em></sub></small>) of sodium borosilicate glasses. Four models—random forest, extreme gradient boosting, support vector machines, and K-nearest neighbors—were trained using both compositional and structural features derived from statistical mechanics. Incorporating these structural descriptors significantly improved model performance. This is evident from reduction in mean absolute error (14.85 K → 13.76 K), root mean square error (21.78 → 19.12) and increase in <em>R</em><small><sup>2</sup></small> (0.88 → 0.91) measured on testing the dataset for the random forest model. Similar performance improvement was seen for other models as well. Building on this, we propose a three-step predictive strategy that enhances generalization across compositions and accurately predict the <em>T</em><small><sub><em>g</em></sub></small> of unseen compositions, achieving a mean absolute error of approximately 8 K and an <em>R</em><small><sup>2</sup></small> value of around 0.98. Our method demonstrates improved accuracy when benchmarked against GlassNet, which represents the current state-of-the-art in property prediction for glasses. These results highlight the importance of considering structural information in improving prediction capabilities of machine learning models for composition-specific small datasets. This approach can assist in the rapid screening and design of glass materials, reducing the reliance on time-consuming experiments and guiding future research toward targeted property optimization.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 12","pages":" 3764-3773"},"PeriodicalIF":6.2,"publicationDate":"2025-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00326a?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145659237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generalized DeepONets for viscosity prediction using learned entropy scaling references 广义DeepONets粘度预测使用学习熵尺度参考
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2025-10-22 DOI: 10.1039/D5DD00179J
Maximiliam Fleck, Marcelle B. M. Spera, Samir Darouich, Timo Klenk and Niels Hansen

Data-driven approaches used to predict thermophysical properties benefit from physical constraints because the extrapolation behavior can be improved and the amount of training data be reduced. In the present work, the well-established entropy scaling approach is incorporated into a neural network architecture to predict the shear viscosity of a diverse set of pure fluids over a large temperature and pressure range. Instead of imposing a particular form of the reference entropy and reference shear viscosity, these properties are learned. The resulting architecture can be interpreted as two linked DeepONets with generalization capabilities.

用于预测热物性的数据驱动方法受益于物理约束,因为可以改善外推行为,减少训练数据量。在目前的工作中,已建立的熵标度方法被纳入到神经网络架构中,以预测不同纯流体在大温度和压力范围内的剪切粘度。而不是强加一个特定形式的参考熵和参考剪切粘度,这些性质是学习的。由此产生的体系结构可以解释为两个具有泛化能力的链接deeponet。
{"title":"Generalized DeepONets for viscosity prediction using learned entropy scaling references","authors":"Maximiliam Fleck, Marcelle B. M. Spera, Samir Darouich, Timo Klenk and Niels Hansen","doi":"10.1039/D5DD00179J","DOIUrl":"https://doi.org/10.1039/D5DD00179J","url":null,"abstract":"<p >Data-driven approaches used to predict thermophysical properties benefit from physical constraints because the extrapolation behavior can be improved and the amount of training data be reduced. In the present work, the well-established entropy scaling approach is incorporated into a neural network architecture to predict the shear viscosity of a diverse set of pure fluids over a large temperature and pressure range. Instead of imposing a particular form of the reference entropy and reference shear viscosity, these properties are learned. The resulting architecture can be interpreted as two linked DeepONets with generalization capabilities.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 12","pages":" 3578-3587"},"PeriodicalIF":6.2,"publicationDate":"2025-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00179j?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145659258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GoFlow: efficient transition state geometry prediction with flow matching and E(3)-equivariant neural networks GoFlow:基于流量匹配和E(3)-等变神经网络的高效过渡状态几何预测。
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2025-10-21 DOI: 10.1039/D5DD00283D
Leonard Galustian, Konstantin Mark, Johannes Karwounopoulos, Maximilian P.-P. Kovar and Esther Heid

Transition state (TS) geometries of chemical reactions are key to understanding reaction mechanisms and estimating kinetic properties. Inferring these directly from 2D reaction graphs offers chemists a powerful tool for rapid and accessible reaction analysis. Quantum chemical methods for computing TSs are computationally intensive and often infeasible for larger molecular systems. Recently, deep learning-based diffusion models have shown promise in generating TSs from 2D reaction graphs for single-step reactions. However, framing TS generation as a diffusion process, by design, requires a prohibitively large number of sampling steps during inference. Here we show that modeling TS generation as an optimal transport flow problem, solved via E(3)-equivariant flow matching with geometric tensor networks, achieves over a hundredfold speedup in inference while improving geometric accuracy compared to the state-of-the-art. This breakthrough increase in sampling efficiency and predictive accuracy enables the practical use of deep learning-based TS generators in high-throughput settings for larger and more complex chemical systems. Our method, GoFlow, thus represents a significant methodological advancement in machine learning-based TS generation, bringing it closer to widespread use in computational chemistry workflows.

化学反应的过渡态几何是理解反应机理和估计反应动力学性质的关键。从二维反应图中直接推断这些,为化学家提供了一个快速、方便的反应分析的强大工具。计算TSs的量子化学方法是计算密集型的,通常不适用于较大的分子系统。最近,基于深度学习的扩散模型在从2D反应图生成单步反应的TSs方面显示出了前景。然而,根据设计,将TS生成作为扩散过程需要在推理期间进行大量的采样步骤。在这里,我们展示了将TS生成建模为一个最优运输流问题,通过与几何张量网络的E(3)-等变流匹配来解决,与最先进的方法相比,在提高几何精度的同时,在推理上实现了100倍以上的加速。这一突破性的提高了采样效率和预测准确性,使基于深度学习的TS发生器能够在更大、更复杂的化学系统的高通量设置中实际使用。因此,我们的方法GoFlow代表了基于机器学习的TS生成方法的重大进步,使其更接近于在计算化学工作流程中的广泛应用。
{"title":"GoFlow: efficient transition state geometry prediction with flow matching and E(3)-equivariant neural networks","authors":"Leonard Galustian, Konstantin Mark, Johannes Karwounopoulos, Maximilian P.-P. Kovar and Esther Heid","doi":"10.1039/D5DD00283D","DOIUrl":"10.1039/D5DD00283D","url":null,"abstract":"<p >Transition state (TS) geometries of chemical reactions are key to understanding reaction mechanisms and estimating kinetic properties. Inferring these directly from 2D reaction graphs offers chemists a powerful tool for rapid and accessible reaction analysis. Quantum chemical methods for computing TSs are computationally intensive and often infeasible for larger molecular systems. Recently, deep learning-based diffusion models have shown promise in generating TSs from 2D reaction graphs for single-step reactions. However, framing TS generation as a diffusion process, by design, requires a prohibitively large number of sampling steps during inference. Here we show that modeling TS generation as an optimal transport flow problem, solved <em>via</em> E(3)-equivariant flow matching with geometric tensor networks, achieves over a hundredfold speedup in inference while improving geometric accuracy compared to the state-of-the-art. This breakthrough increase in sampling efficiency and predictive accuracy enables the practical use of deep learning-based TS generators in high-throughput settings for larger and more complex chemical systems. Our method, GoFlow, thus represents a significant methodological advancement in machine learning-based TS generation, bringing it closer to widespread use in computational chemistry workflows.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 12","pages":" 3492-3501"},"PeriodicalIF":6.2,"publicationDate":"2025-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12580847/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145446586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Material dynamics analysis with deep generative model 基于深度生成模型的材料动力学分析
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2025-10-20 DOI: 10.1039/D5DD00277J
Duc-Anh Dao, Minh-Quyet Ha, Tien-Sinh Vu, Shuntaro Takazawa, Nozomu Ishiguro, Yukio Takahashi, Suzuki Masato, Takashi Kakubo, Naoya Amino, Hirosuke Matsui, Mizuki Tada and Hieu-Chi Dam

Understanding nanoscale material evolution—including phase transitions, structural deformations, and chemical reactions—under dynamic conditions remains a fundamental challenge in materials science. While advanced imaging techniques enable visualization of transformation processes, they typically capture only discrete temporal observations at specific time intervals. Consequently, intermediate stages and alternative pathways between captured images often remain unresolved, introducing ambiguity in analyzing material dynamics and transformation mechanisms. To address these limitations, we present a two-stage framework using deep generative models to probabilistically reconstruct intermediate transformations. Our framework is based on the hypothesis that generative models trained to reproduce experimental images inherently capture the dynamical processes that generated those observations. By integrating these trained generative models into Monte Carlo simulations, we generate plausible transformation pathways that interpolate unobserved intermediate stages. This approach enables the extraction of meaningful insights and the statistical analysis of material dynamics. This study also evaluates the framework's applicability across three phenomena: tantalum test chart translation, gold nanoparticle diffusion in polyvinyl alcohol solution, and copper sulfidation in heterogeneous rubber/brass composites. The generated transformations closely replicate experimental observations while revealing previously unrecognized dynamic behaviors for future experimental validation. These findings suggest that learned generative models encode physically meaningful continuity, enabling statistical interpolation of unobserved intermediate states and classification of transformation modes under sparse observational constraints.

了解纳米材料在动态条件下的演变——包括相变、结构变形和化学反应——仍然是材料科学的一个基本挑战。虽然先进的成像技术可以实现转换过程的可视化,但它们通常只能在特定的时间间隔内捕获离散的时间观察。因此,捕获图像之间的中间阶段和替代途径往往仍未解决,在分析材料动力学和转化机制时引入了模糊性。为了解决这些限制,我们提出了一个使用深度生成模型的两阶段框架,以概率地重建中间转换。我们的框架是基于这样一个假设,即经过训练的生成模型可以复制实验图像,从而固有地捕捉产生这些观察结果的动态过程。通过将这些训练有素的生成模型集成到蒙特卡罗模拟中,我们生成了合理的转换路径,可以插入未观察到的中间阶段。这种方法能够提取有意义的见解和材料动力学的统计分析。本研究还评估了该框架在三种现象中的适用性:钽测试图转换、金纳米颗粒在聚乙烯醇溶液中的扩散以及铜在非均相橡胶/黄铜复合材料中的硫化。生成的转换紧密复制实验观察,同时揭示以前未识别的动态行为,以供将来的实验验证。这些发现表明,学习生成模型编码物理上有意义的连续性,使未观测到的中间状态的统计插值和稀疏观测约束下的转换模式分类成为可能。
{"title":"Material dynamics analysis with deep generative model","authors":"Duc-Anh Dao, Minh-Quyet Ha, Tien-Sinh Vu, Shuntaro Takazawa, Nozomu Ishiguro, Yukio Takahashi, Suzuki Masato, Takashi Kakubo, Naoya Amino, Hirosuke Matsui, Mizuki Tada and Hieu-Chi Dam","doi":"10.1039/D5DD00277J","DOIUrl":"https://doi.org/10.1039/D5DD00277J","url":null,"abstract":"<p >Understanding nanoscale material evolution—including phase transitions, structural deformations, and chemical reactions—under dynamic conditions remains a fundamental challenge in materials science. While advanced imaging techniques enable visualization of transformation processes, they typically capture only discrete temporal observations at specific time intervals. Consequently, intermediate stages and alternative pathways between captured images often remain unresolved, introducing ambiguity in analyzing material dynamics and transformation mechanisms. To address these limitations, we present a two-stage framework using deep generative models to probabilistically reconstruct intermediate transformations. Our framework is based on the hypothesis that generative models trained to reproduce experimental images inherently capture the dynamical processes that generated those observations. By integrating these trained generative models into Monte Carlo simulations, we generate plausible transformation pathways that interpolate unobserved intermediate stages. This approach enables the extraction of meaningful insights and the statistical analysis of material dynamics. This study also evaluates the framework's applicability across three phenomena: tantalum test chart translation, gold nanoparticle diffusion in polyvinyl alcohol solution, and copper sulfidation in heterogeneous rubber/brass composites. The generated transformations closely replicate experimental observations while revealing previously unrecognized dynamic behaviors for future experimental validation. These findings suggest that learned generative models encode physically meaningful continuity, enabling statistical interpolation of unobserved intermediate states and classification of transformation modes under sparse observational constraints.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 3363-3377"},"PeriodicalIF":6.2,"publicationDate":"2025-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00277j?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145442798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Screening Diels–Alder reaction space to identify candidate reactions for self-healing polymer applications 筛选Diels-Alder反应空间,以确定自愈聚合物应用的候选反应
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2025-10-20 DOI: 10.1039/D5DD00340G
Maxime Ferrer, Bowen Deng, Javier E. Alfonso-Ramos and Thijs Stuyver

Plastics are essential in modern society, but their susceptibility to damage limits their lifespan and performance, and results in unsustainable waste production. Self-healing polymers based on thermally reversible Diels–Alder (DA) reactions offer a potential solution by enabling heating controlled repair through bond-breaking and reformation. However, discovering new suitable DA monomer combinations has largely relied on intuition and trial-and-error so far. Here, we present a hierarchical workflow that integrates machine learning (ML) with automated reaction profile calculations to efficiently screen DA reactions for self-healing polymer applications. Using our in-house TS-tools software, we generate high-throughput profiles at the semi-empirical xTB level. Refining only a small fraction with DFT, we are able to train a robust ML model that predicts reaction characteristics with excellent accuracy. Adding a graph-based ML model to the workflow for pre-screening enables expansion to reaction spaces of hundreds of thousands of reactions, at a marginal cost. We first leverage our models to screen a comprehensive reaction space of synthetic diene–dienophile pairs, and subsequently use them to mine a database of commercially available natural products. Overall, this hybrid ML-computational chemistry approach enables data-efficient discovery of thermally responsive DA reactions, advancing the rational design of self-healing polymers with tunable properties.

塑料在现代社会中是必不可少的,但它们对损坏的敏感性限制了它们的使用寿命和性能,并导致不可持续的废物生产。基于热可逆Diels-Alder (DA)反应的自愈聚合物提供了一种潜在的解决方案,通过断键和重组实现热控制修复。然而,迄今为止,发现新的合适的DA单体组合在很大程度上依赖于直觉和试错。在这里,我们提出了一个分层工作流程,将机器学习(ML)与自动反应剖面计算相结合,以有效地筛选DA反应,用于自修复聚合物应用。使用我们内部的ts工具软件,我们在半经验xTB级别生成高通量配置文件。仅用DFT精炼一小部分,我们就能够训练出一个鲁棒的ML模型,以极好的精度预测反应特征。将基于图的ML模型添加到预筛选的工作流程中,可以以边际成本扩展到数十万个反应的反应空间。我们首先利用我们的模型来筛选合成二烯-亲二烯对的综合反应空间,然后使用它们来挖掘商业上可用的天然产物数据库。总的来说,这种混合ml -计算化学方法能够有效地发现热响应性DA反应,促进具有可调性质的自修复聚合物的合理设计。
{"title":"Screening Diels–Alder reaction space to identify candidate reactions for self-healing polymer applications","authors":"Maxime Ferrer, Bowen Deng, Javier E. Alfonso-Ramos and Thijs Stuyver","doi":"10.1039/D5DD00340G","DOIUrl":"https://doi.org/10.1039/D5DD00340G","url":null,"abstract":"<p >Plastics are essential in modern society, but their susceptibility to damage limits their lifespan and performance, and results in unsustainable waste production. Self-healing polymers based on thermally reversible Diels–Alder (DA) reactions offer a potential solution by enabling heating controlled repair through bond-breaking and reformation. However, discovering new suitable DA monomer combinations has largely relied on intuition and trial-and-error so far. Here, we present a hierarchical workflow that integrates machine learning (ML) with automated reaction profile calculations to efficiently screen DA reactions for self-healing polymer applications. Using our in-house TS-tools software, we generate high-throughput profiles at the semi-empirical <em>x</em>TB level. Refining only a small fraction with DFT, we are able to train a robust ML model that predicts reaction characteristics with excellent accuracy. Adding a graph-based ML model to the workflow for pre-screening enables expansion to reaction spaces of hundreds of thousands of reactions, at a marginal cost. We first leverage our models to screen a comprehensive reaction space of synthetic diene–dienophile pairs, and subsequently use them to mine a database of commercially available natural products. Overall, this hybrid ML-computational chemistry approach enables data-efficient discovery of thermally responsive DA reactions, advancing the rational design of self-healing polymers with tunable properties.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 3400-3410"},"PeriodicalIF":6.2,"publicationDate":"2025-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00340g?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145442800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ELECTRUM: an electron configuration-based universal metal fingerprint for transition metal compounds ELECTRUM:基于电子构型的过渡金属化合物通用金属指纹。
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2025-10-20 DOI: 10.1039/D5DD00145E
Markus Orsi and Angelo Frei

Machine learning has experienced a drastic rise in interest and applications in all fields of chemistry, enabling researchers to leverage large chemical datasets to gain novel insights. The success of machine learning-driven projects in chemistry hinges on three key factors: access to robust and comprehensive datasets, a well-defined objective, and effective molecular representations that convert chemical structures into machine-readable formats. Transition metal complexes have lagged behind their organic counterparts on all three of these avenues. The large diversity of structures, coordination numbers and modes have made its translation to a machine-readable format an ongoing challenge. Here we introduce ELECTRUM, an electron configuration-based universal metal fingerprint for transition metal compounds. Its lightweight implementation enables the straightforward conversion of any transition metal complex into a simple fingerprint. Utilising a novel dataset generated from the Cambridge Structural Database (CSD), we demonstrate that ELECTRUM effectively captures the structural diversity of transition metal complexes. By plotting nearest-neighbor relationships in ELECTRUM space, we reveal meaningful clustering in two-dimensional representations. Furthermore, we use the ELECTRUM encoding to train machine learning models on the prediction of metal complex coordination numbers from ligand structures and metal identity alone. We show that on a subset of this data, we can train models to predict the oxidation state of metal complexes. These case studies showcase the potential of ELECTRUM as an easy-to-implement fingerprint for metal complexes. We rely on the community to further test, validate, and improve it.

机器学习在化学所有领域的兴趣和应用都急剧上升,使研究人员能够利用大型化学数据集获得新的见解。化学领域机器学习驱动项目的成功取决于三个关键因素:访问强大而全面的数据集,定义明确的目标,以及将化学结构转换为机器可读格式的有效分子表示。过渡金属配合物在这三个方面都落后于有机配合物。结构、配位数和模式的巨大多样性使得将其翻译成机器可读格式成为一个持续的挑战。本文介绍了基于电子组态的过渡金属化合物通用金属指纹图谱ELECTRUM。它的轻量级实现可以将任何过渡金属复合物直接转换为简单的指纹。利用剑桥结构数据库(CSD)生成的新数据集,我们证明了ELECTRUM有效地捕获了过渡金属配合物的结构多样性。通过绘制ELECTRUM空间中的最近邻关系,我们揭示了二维表示中有意义的聚类。此外,我们使用ELECTRUM编码来训练机器学习模型,仅从配体结构和金属身份预测金属配合物配位数。我们表明,在这些数据的一个子集上,我们可以训练模型来预测金属配合物的氧化态。这些案例研究展示了ELECTRUM作为一种易于实现的金属配合物指纹的潜力。我们依靠社区来进一步测试、验证和改进它。
{"title":"ELECTRUM: an electron configuration-based universal metal fingerprint for transition metal compounds","authors":"Markus Orsi and Angelo Frei","doi":"10.1039/D5DD00145E","DOIUrl":"10.1039/D5DD00145E","url":null,"abstract":"<p >Machine learning has experienced a drastic rise in interest and applications in all fields of chemistry, enabling researchers to leverage large chemical datasets to gain novel insights. The success of machine learning-driven projects in chemistry hinges on three key factors: access to robust and comprehensive datasets, a well-defined objective, and effective molecular representations that convert chemical structures into machine-readable formats. Transition metal complexes have lagged behind their organic counterparts on all three of these avenues. The large diversity of structures, coordination numbers and modes have made its translation to a machine-readable format an ongoing challenge. Here we introduce ELECTRUM, an electron configuration-based universal metal fingerprint for transition metal compounds. Its lightweight implementation enables the straightforward conversion of any transition metal complex into a simple fingerprint. Utilising a novel dataset generated from the Cambridge Structural Database (CSD), we demonstrate that ELECTRUM effectively captures the structural diversity of transition metal complexes. By plotting nearest-neighbor relationships in ELECTRUM space, we reveal meaningful clustering in two-dimensional representations. Furthermore, we use the ELECTRUM encoding to train machine learning models on the prediction of metal complex coordination numbers from ligand structures and metal identity alone. We show that on a subset of this data, we can train models to predict the oxidation state of metal complexes. These case studies showcase the potential of ELECTRUM as an easy-to-implement fingerprint for metal complexes. We rely on the community to further test, validate, and improve it.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 12","pages":" 3567-3577"},"PeriodicalIF":6.2,"publicationDate":"2025-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12548721/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145373288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The 2D-drone swarm, a safe open-source sample transfer system for fully automated laboratories 2d无人机群,一个安全的开源样本传输系统,用于全自动实验室
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2025-10-17 DOI: 10.1039/D5DD00342C
Edy Mariano, Yannis Coderey, Yasmine El Goumi, Jasper Tan, Tanguy Cavagna, Jean-Charles Cousty, Vincenzo Scamarcio, Josie Hughes and Pascal Miéville

Laboratory automation is an active field in biology, drug discovery, and more recently in synthetic chemistry and materials science. Local automation has existed in the field for quite some time, but long-range or total laboratory automation is much less developed. In this article, we present a complete, open and decentralized global automation system called the 2D drone swarm system. It is based on a simple approach of small mobile robots moving autonomously in a dedicated track suspended above the scientific equipment for the long-distance sample and closely connected to localized robotic arms dedicated to short-distance transfers, interaction with scientific equipment and direct sample processing. This approach is inspired by the Kiva/Amazon model, where isolated autonomous mobile robots automatically deliver goods to external operators. It is also inspired by the modern automotive industry, such as Tesla's Gigafactories, to provide an evolutionary and flexible system that can adapt to numerous types of tasks with a minimum amount of resources and easily adapt to different types of workstations. This global automation system is controlled directly from the Laboratory Scheduler by a Robot Subscheduler, coded in an open-source environment, which takes care of all mobile and local robot operations. The result is an operator and scientific equipment safe, cost and energy-efficient, easily extensible and open-source global laboratory automation system that can be adapted to many different applications and laboratories.

实验室自动化在生物学、药物发现以及最近的合成化学和材料科学中是一个活跃的领域。局部自动化在该领域已经存在了相当长的一段时间,但远程或全面的实验室自动化还远远不够发达。在本文中,我们提出了一个完整的、开放的、分散的全球自动化系统,称为二维无人机群系统。它基于一种简单的方法,即小型移动机器人在悬浮在科学设备上方的专用轨道上自主移动,用于远距离采样,并与专用于短距离传输、与科学设备交互和直接采样处理的局部机械臂紧密相连。这种方法受到Kiva/Amazon模式的启发,在Kiva/Amazon模式中,孤立的自主移动机器人会自动将货物交付给外部运营商。它也受到现代汽车工业的启发,比如特斯拉的gigafactory,提供了一个进化和灵活的系统,可以用最少的资源适应多种类型的任务,并且很容易适应不同类型的工作站。这个全球自动化系统由机器人子调度程序直接从实验室调度程序控制,该程序在开源环境中编码,负责所有移动和本地机器人操作。其结果是一个操作人员和科学设备安全,成本低,节能,易于扩展和开源的全球实验室自动化系统,可以适应许多不同的应用和实验室。
{"title":"The 2D-drone swarm, a safe open-source sample transfer system for fully automated laboratories","authors":"Edy Mariano, Yannis Coderey, Yasmine El Goumi, Jasper Tan, Tanguy Cavagna, Jean-Charles Cousty, Vincenzo Scamarcio, Josie Hughes and Pascal Miéville","doi":"10.1039/D5DD00342C","DOIUrl":"https://doi.org/10.1039/D5DD00342C","url":null,"abstract":"<p >Laboratory automation is an active field in biology, drug discovery, and more recently in synthetic chemistry and materials science. Local automation has existed in the field for quite some time, but long-range or total laboratory automation is much less developed. In this article, we present a complete, open and decentralized global automation system called the 2D drone swarm system. It is based on a simple approach of small mobile robots moving autonomously in a dedicated track suspended above the scientific equipment for the long-distance sample and closely connected to localized robotic arms dedicated to short-distance transfers, interaction with scientific equipment and direct sample processing. This approach is inspired by the Kiva/Amazon model, where isolated autonomous mobile robots automatically deliver goods to external operators. It is also inspired by the modern automotive industry, such as Tesla's Gigafactories, to provide an evolutionary and flexible system that can adapt to numerous types of tasks with a minimum amount of resources and easily adapt to different types of workstations. This global automation system is controlled directly from the Laboratory Scheduler by a Robot Subscheduler, coded in an open-source environment, which takes care of all mobile and local robot operations. The result is an operator and scientific equipment safe, cost and energy-efficient, easily extensible and open-source global laboratory automation system that can be adapted to many different applications and laboratories.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 3162-3174"},"PeriodicalIF":6.2,"publicationDate":"2025-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00342c?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145442792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep learning reveals key predictors of thermal conductivity in covalent organic frameworks 深度学习揭示了共价有机框架中热导率的关键预测因素
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2025-10-17 DOI: 10.1039/D5DD00126A
Prakash Thakolkaran, Yiwen Zheng, Yaqi Guo, Aniruddh Vashisth and Siddhant Kumar

The thermal conductivity of covalent organic frameworks (COFs), an emerging class of nanoporous polymeric materials, is crucial for many applications, yet the link between their structure and thermal properties remains poorly understood. Analysis of a dataset containing over 2400 COFs reveals that conventional features such as density, pore size, void fraction, and surface area do not reliably predict thermal conductivity. To address this, an attention-based machine learning model was trained, accurately predicting thermal conductivities even for structures outside the training set. The attention mechanism was then utilized to investigate the model's success. The analysis identified dangling molecular branches as a key predictor of thermal conductivity, leading us to define the dangling mass ratio (DMR), a descriptor that quantifies the fraction of atomic mass in dangling branches relative to the total COF mass. Feature importance assessments on regression models confirm the significance of DMR in predicting thermal conductivity. These findings indicate that COFs with dangling functional groups exhibit lower thermal transfer capabilities. Molecular dynamics simulations support this observation, revealing significant mismatches in the vibrational density of states due to the presence of dangling branches.

共价有机框架(COFs)是一种新兴的纳米多孔聚合物材料,其导热性对许多应用至关重要,但其结构和热性能之间的联系仍然知之甚少。对包含2400多个COFs的数据集的分析表明,密度、孔径、空隙率和表面积等常规特征并不能可靠地预测导热系数。为了解决这个问题,我们训练了一个基于注意力的机器学习模型,即使对于训练集之外的结构,它也能准确地预测热导率。然后利用注意机制来研究该模型的成功。该分析确定了悬垂分子分支是热导率的关键预测因子,从而使我们定义了悬垂质量比(DMR),这是一个量化悬垂分支中原子质量相对于总COF质量的比例的描述子。回归模型的特征重要性评估证实了DMR在预测导热系数方面的重要性。这些结果表明,具有悬垂官能团的COFs具有较低的热传递能力。分子动力学模拟支持这一观察结果,揭示了由于悬垂分支的存在而导致的状态振动密度的显著不匹配。
{"title":"Deep learning reveals key predictors of thermal conductivity in covalent organic frameworks","authors":"Prakash Thakolkaran, Yiwen Zheng, Yaqi Guo, Aniruddh Vashisth and Siddhant Kumar","doi":"10.1039/D5DD00126A","DOIUrl":"https://doi.org/10.1039/D5DD00126A","url":null,"abstract":"<p >The thermal conductivity of covalent organic frameworks (COFs), an emerging class of nanoporous polymeric materials, is crucial for many applications, yet the link between their structure and thermal properties remains poorly understood. Analysis of a dataset containing over 2400 COFs reveals that conventional features such as density, pore size, void fraction, and surface area do not reliably predict thermal conductivity. To address this, an attention-based machine learning model was trained, accurately predicting thermal conductivities even for structures outside the training set. The attention mechanism was then utilized to investigate the model's success. The analysis identified dangling molecular branches as a key predictor of thermal conductivity, leading us to define the dangling mass ratio (DMR), a descriptor that quantifies the fraction of atomic mass in dangling branches relative to the total COF mass. Feature importance assessments on regression models confirm the significance of DMR in predicting thermal conductivity. These findings indicate that COFs with dangling functional groups exhibit lower thermal transfer capabilities. Molecular dynamics simulations support this observation, revealing significant mismatches in the vibrational density of states due to the presence of dangling branches.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 3351-3362"},"PeriodicalIF":6.2,"publicationDate":"2025-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00126a?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145442797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction: Enhancing multifunctional drug screening via artificial intelligence 更正:通过人工智能加强多功能药物筛选
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2025-10-17 DOI: 10.1039/D5DD90048D
Junlin Dong, Chenyang Wu, Tianle Lu, Shiyu Wang, Wenjin Zhan, Marc Xu, Bing Wang, Zhenquan Hu, Horst Vogel and Shuguang Yuan

Correction for ‘Enhancing multifunctional drug screening via artificial intelligence’ by Junlin Dong et al., Digital Discovery, 2025, 4, 2012–2024, https://doi.org/10.1039/D5DD00082C.

对“通过人工智能增强多功能药物筛选”的修正董俊林等,数字发现,2025,4,2012 - 2024,https://doi.org/10.1039/D5DD00082C。
{"title":"Correction: Enhancing multifunctional drug screening via artificial intelligence","authors":"Junlin Dong, Chenyang Wu, Tianle Lu, Shiyu Wang, Wenjin Zhan, Marc Xu, Bing Wang, Zhenquan Hu, Horst Vogel and Shuguang Yuan","doi":"10.1039/D5DD90048D","DOIUrl":"https://doi.org/10.1039/D5DD90048D","url":null,"abstract":"<p >Correction for ‘Enhancing multifunctional drug screening <em>via</em> artificial intelligence’ by Junlin Dong <em>et al.</em>, <em>Digital Discovery</em>, 2025, <strong>4</strong>, 2012–2024, https://doi.org/10.1039/D5DD00082C.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 3411-3411"},"PeriodicalIF":6.2,"publicationDate":"2025-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd90048d?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145442799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimal message passing for molecular prediction is simple, attentive and spatial 分子预测的最佳信息传递是简单、专注和空间化的
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2025-10-16 DOI: 10.1039/D5DD00193E
Alma C. Castañeda-Leautaud and Rommie E. Amaro

Strategies to improve the predicting performance of Message-Passing Neural-Networks for molecular property predictions can be achieved by simplifying how the message is passed and by using descriptors that capture multiple aspects of molecular graphs. In this work, we designed model architectures that achieved state-of-the-art performance, surpassing more complex models such as those pre-trained on external databases. We assessed dataset diversity to complement our performance results, finding that structural diversity influences the need for additional components in our MPNNs and feature sets. In most datasets, our best architecture employs bidirectional message-passing with an attention mechanism, applied to a minimalist message formulation that excludes self-perception, highlighting that relatively simpler models, compared to classical MPNNs, yield higher class separability. In contrast, we found that convolution normalization factors do not benefit the predictive power in all the datasets tested. This was corroborated in both global and node-level outputs. Additionally, we analyzed the influence of both adding spatial features and working with 3D graphs, finding that 2D molecular graphs are sufficient when complemented with appropriately chosen 3D descriptors. This approach not only preserves predictive performance but also reduces computational cost by over 50%, making it particularly advantageous for high-throughput screening campaigns.

提高分子属性预测的消息传递神经网络预测性能的策略可以通过简化消息传递方式和使用捕获分子图多个方面的描述符来实现。在这项工作中,我们设计的模型架构达到了最先进的性能,超越了更复杂的模型,比如那些在外部数据库上预先训练的模型。我们评估了数据集的多样性,以补充我们的性能结果,发现结构多样性影响了我们的mpnn和特征集中对额外组件的需求。在大多数数据集中,我们最好的架构采用双向消息传递和注意机制,应用于排除自我感知的极简消息公式,强调相对简单的模型,与经典的mpnn相比,产生更高的类可分性。相比之下,我们发现卷积归一化因子并没有在所有测试的数据集中提高预测能力。这在全球和节点级的产出中得到证实。此外,我们分析了添加空间特征和使用3D图的影响,发现在适当选择的3D描述符的补充下,2D分子图就足够了。这种方法不仅保持了预测性能,而且减少了50%以上的计算成本,对高通量筛选活动特别有利。
{"title":"Optimal message passing for molecular prediction is simple, attentive and spatial","authors":"Alma C. Castañeda-Leautaud and Rommie E. Amaro","doi":"10.1039/D5DD00193E","DOIUrl":"https://doi.org/10.1039/D5DD00193E","url":null,"abstract":"<p >Strategies to improve the predicting performance of Message-Passing Neural-Networks for molecular property predictions can be achieved by simplifying how the message is passed and by using descriptors that capture multiple aspects of molecular graphs. In this work, we designed model architectures that achieved state-of-the-art performance, surpassing more complex models such as those pre-trained on external databases. We assessed dataset diversity to complement our performance results, finding that structural diversity influences the need for additional components in our MPNNs and feature sets. In most datasets, our best architecture employs bidirectional message-passing with an attention mechanism, applied to a minimalist message formulation that excludes self-perception, highlighting that relatively simpler models, compared to classical MPNNs, yield higher class separability. In contrast, we found that convolution normalization factors do not benefit the predictive power in all the datasets tested. This was corroborated in both global and node-level outputs. Additionally, we analyzed the influence of both adding spatial features and working with 3D graphs, finding that 2D molecular graphs are sufficient when complemented with appropriately chosen 3D descriptors. This approach not only preserves predictive performance but also reduces computational cost by over 50%, making it particularly advantageous for high-throughput screening campaigns.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 3320-3338"},"PeriodicalIF":6.2,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00193e?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145442773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Digital discovery
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1