We present a physics-informed machine learning approach to predict the glass transition temperature (Tg) of sodium borosilicate glasses. Four models—random forest, extreme gradient boosting, support vector machines, and K-nearest neighbors—were trained using both compositional and structural features derived from statistical mechanics. Incorporating these structural descriptors significantly improved model performance. This is evident from reduction in mean absolute error (14.85 K → 13.76 K), root mean square error (21.78 → 19.12) and increase in R2 (0.88 → 0.91) measured on testing the dataset for the random forest model. Similar performance improvement was seen for other models as well. Building on this, we propose a three-step predictive strategy that enhances generalization across compositions and accurately predict the Tg of unseen compositions, achieving a mean absolute error of approximately 8 K and an R2 value of around 0.98. Our method demonstrates improved accuracy when benchmarked against GlassNet, which represents the current state-of-the-art in property prediction for glasses. These results highlight the importance of considering structural information in improving prediction capabilities of machine learning models for composition-specific small datasets. This approach can assist in the rapid screening and design of glass materials, reducing the reliance on time-consuming experiments and guiding future research toward targeted property optimization.
{"title":"An improved machine learning strategy using structural features to predict the glass transition temperature of oxide glasses","authors":"Satwinder Singh Danewalia and Kulvir Singh","doi":"10.1039/D5DD00326A","DOIUrl":"https://doi.org/10.1039/D5DD00326A","url":null,"abstract":"<p >We present a physics-informed machine learning approach to predict the glass transition temperature (<em>T</em><small><sub><em>g</em></sub></small>) of sodium borosilicate glasses. Four models—random forest, extreme gradient boosting, support vector machines, and K-nearest neighbors—were trained using both compositional and structural features derived from statistical mechanics. Incorporating these structural descriptors significantly improved model performance. This is evident from reduction in mean absolute error (14.85 K → 13.76 K), root mean square error (21.78 → 19.12) and increase in <em>R</em><small><sup>2</sup></small> (0.88 → 0.91) measured on testing the dataset for the random forest model. Similar performance improvement was seen for other models as well. Building on this, we propose a three-step predictive strategy that enhances generalization across compositions and accurately predict the <em>T</em><small><sub><em>g</em></sub></small> of unseen compositions, achieving a mean absolute error of approximately 8 K and an <em>R</em><small><sup>2</sup></small> value of around 0.98. Our method demonstrates improved accuracy when benchmarked against GlassNet, which represents the current state-of-the-art in property prediction for glasses. These results highlight the importance of considering structural information in improving prediction capabilities of machine learning models for composition-specific small datasets. This approach can assist in the rapid screening and design of glass materials, reducing the reliance on time-consuming experiments and guiding future research toward targeted property optimization.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 12","pages":" 3764-3773"},"PeriodicalIF":6.2,"publicationDate":"2025-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00326a?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145659237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maximiliam Fleck, Marcelle B. M. Spera, Samir Darouich, Timo Klenk and Niels Hansen
Data-driven approaches used to predict thermophysical properties benefit from physical constraints because the extrapolation behavior can be improved and the amount of training data be reduced. In the present work, the well-established entropy scaling approach is incorporated into a neural network architecture to predict the shear viscosity of a diverse set of pure fluids over a large temperature and pressure range. Instead of imposing a particular form of the reference entropy and reference shear viscosity, these properties are learned. The resulting architecture can be interpreted as two linked DeepONets with generalization capabilities.
{"title":"Generalized DeepONets for viscosity prediction using learned entropy scaling references","authors":"Maximiliam Fleck, Marcelle B. M. Spera, Samir Darouich, Timo Klenk and Niels Hansen","doi":"10.1039/D5DD00179J","DOIUrl":"https://doi.org/10.1039/D5DD00179J","url":null,"abstract":"<p >Data-driven approaches used to predict thermophysical properties benefit from physical constraints because the extrapolation behavior can be improved and the amount of training data be reduced. In the present work, the well-established entropy scaling approach is incorporated into a neural network architecture to predict the shear viscosity of a diverse set of pure fluids over a large temperature and pressure range. Instead of imposing a particular form of the reference entropy and reference shear viscosity, these properties are learned. The resulting architecture can be interpreted as two linked DeepONets with generalization capabilities.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 12","pages":" 3578-3587"},"PeriodicalIF":6.2,"publicationDate":"2025-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00179j?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145659258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Leonard Galustian, Konstantin Mark, Johannes Karwounopoulos, Maximilian P.-P. Kovar and Esther Heid
Transition state (TS) geometries of chemical reactions are key to understanding reaction mechanisms and estimating kinetic properties. Inferring these directly from 2D reaction graphs offers chemists a powerful tool for rapid and accessible reaction analysis. Quantum chemical methods for computing TSs are computationally intensive and often infeasible for larger molecular systems. Recently, deep learning-based diffusion models have shown promise in generating TSs from 2D reaction graphs for single-step reactions. However, framing TS generation as a diffusion process, by design, requires a prohibitively large number of sampling steps during inference. Here we show that modeling TS generation as an optimal transport flow problem, solved via E(3)-equivariant flow matching with geometric tensor networks, achieves over a hundredfold speedup in inference while improving geometric accuracy compared to the state-of-the-art. This breakthrough increase in sampling efficiency and predictive accuracy enables the practical use of deep learning-based TS generators in high-throughput settings for larger and more complex chemical systems. Our method, GoFlow, thus represents a significant methodological advancement in machine learning-based TS generation, bringing it closer to widespread use in computational chemistry workflows.
{"title":"GoFlow: efficient transition state geometry prediction with flow matching and E(3)-equivariant neural networks","authors":"Leonard Galustian, Konstantin Mark, Johannes Karwounopoulos, Maximilian P.-P. Kovar and Esther Heid","doi":"10.1039/D5DD00283D","DOIUrl":"10.1039/D5DD00283D","url":null,"abstract":"<p >Transition state (TS) geometries of chemical reactions are key to understanding reaction mechanisms and estimating kinetic properties. Inferring these directly from 2D reaction graphs offers chemists a powerful tool for rapid and accessible reaction analysis. Quantum chemical methods for computing TSs are computationally intensive and often infeasible for larger molecular systems. Recently, deep learning-based diffusion models have shown promise in generating TSs from 2D reaction graphs for single-step reactions. However, framing TS generation as a diffusion process, by design, requires a prohibitively large number of sampling steps during inference. Here we show that modeling TS generation as an optimal transport flow problem, solved <em>via</em> E(3)-equivariant flow matching with geometric tensor networks, achieves over a hundredfold speedup in inference while improving geometric accuracy compared to the state-of-the-art. This breakthrough increase in sampling efficiency and predictive accuracy enables the practical use of deep learning-based TS generators in high-throughput settings for larger and more complex chemical systems. Our method, GoFlow, thus represents a significant methodological advancement in machine learning-based TS generation, bringing it closer to widespread use in computational chemistry workflows.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 12","pages":" 3492-3501"},"PeriodicalIF":6.2,"publicationDate":"2025-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12580847/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145446586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Duc-Anh Dao, Minh-Quyet Ha, Tien-Sinh Vu, Shuntaro Takazawa, Nozomu Ishiguro, Yukio Takahashi, Suzuki Masato, Takashi Kakubo, Naoya Amino, Hirosuke Matsui, Mizuki Tada and Hieu-Chi Dam
Understanding nanoscale material evolution—including phase transitions, structural deformations, and chemical reactions—under dynamic conditions remains a fundamental challenge in materials science. While advanced imaging techniques enable visualization of transformation processes, they typically capture only discrete temporal observations at specific time intervals. Consequently, intermediate stages and alternative pathways between captured images often remain unresolved, introducing ambiguity in analyzing material dynamics and transformation mechanisms. To address these limitations, we present a two-stage framework using deep generative models to probabilistically reconstruct intermediate transformations. Our framework is based on the hypothesis that generative models trained to reproduce experimental images inherently capture the dynamical processes that generated those observations. By integrating these trained generative models into Monte Carlo simulations, we generate plausible transformation pathways that interpolate unobserved intermediate stages. This approach enables the extraction of meaningful insights and the statistical analysis of material dynamics. This study also evaluates the framework's applicability across three phenomena: tantalum test chart translation, gold nanoparticle diffusion in polyvinyl alcohol solution, and copper sulfidation in heterogeneous rubber/brass composites. The generated transformations closely replicate experimental observations while revealing previously unrecognized dynamic behaviors for future experimental validation. These findings suggest that learned generative models encode physically meaningful continuity, enabling statistical interpolation of unobserved intermediate states and classification of transformation modes under sparse observational constraints.
{"title":"Material dynamics analysis with deep generative model","authors":"Duc-Anh Dao, Minh-Quyet Ha, Tien-Sinh Vu, Shuntaro Takazawa, Nozomu Ishiguro, Yukio Takahashi, Suzuki Masato, Takashi Kakubo, Naoya Amino, Hirosuke Matsui, Mizuki Tada and Hieu-Chi Dam","doi":"10.1039/D5DD00277J","DOIUrl":"https://doi.org/10.1039/D5DD00277J","url":null,"abstract":"<p >Understanding nanoscale material evolution—including phase transitions, structural deformations, and chemical reactions—under dynamic conditions remains a fundamental challenge in materials science. While advanced imaging techniques enable visualization of transformation processes, they typically capture only discrete temporal observations at specific time intervals. Consequently, intermediate stages and alternative pathways between captured images often remain unresolved, introducing ambiguity in analyzing material dynamics and transformation mechanisms. To address these limitations, we present a two-stage framework using deep generative models to probabilistically reconstruct intermediate transformations. Our framework is based on the hypothesis that generative models trained to reproduce experimental images inherently capture the dynamical processes that generated those observations. By integrating these trained generative models into Monte Carlo simulations, we generate plausible transformation pathways that interpolate unobserved intermediate stages. This approach enables the extraction of meaningful insights and the statistical analysis of material dynamics. This study also evaluates the framework's applicability across three phenomena: tantalum test chart translation, gold nanoparticle diffusion in polyvinyl alcohol solution, and copper sulfidation in heterogeneous rubber/brass composites. The generated transformations closely replicate experimental observations while revealing previously unrecognized dynamic behaviors for future experimental validation. These findings suggest that learned generative models encode physically meaningful continuity, enabling statistical interpolation of unobserved intermediate states and classification of transformation modes under sparse observational constraints.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 3363-3377"},"PeriodicalIF":6.2,"publicationDate":"2025-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00277j?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145442798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maxime Ferrer, Bowen Deng, Javier E. Alfonso-Ramos and Thijs Stuyver
Plastics are essential in modern society, but their susceptibility to damage limits their lifespan and performance, and results in unsustainable waste production. Self-healing polymers based on thermally reversible Diels–Alder (DA) reactions offer a potential solution by enabling heating controlled repair through bond-breaking and reformation. However, discovering new suitable DA monomer combinations has largely relied on intuition and trial-and-error so far. Here, we present a hierarchical workflow that integrates machine learning (ML) with automated reaction profile calculations to efficiently screen DA reactions for self-healing polymer applications. Using our in-house TS-tools software, we generate high-throughput profiles at the semi-empirical xTB level. Refining only a small fraction with DFT, we are able to train a robust ML model that predicts reaction characteristics with excellent accuracy. Adding a graph-based ML model to the workflow for pre-screening enables expansion to reaction spaces of hundreds of thousands of reactions, at a marginal cost. We first leverage our models to screen a comprehensive reaction space of synthetic diene–dienophile pairs, and subsequently use them to mine a database of commercially available natural products. Overall, this hybrid ML-computational chemistry approach enables data-efficient discovery of thermally responsive DA reactions, advancing the rational design of self-healing polymers with tunable properties.
{"title":"Screening Diels–Alder reaction space to identify candidate reactions for self-healing polymer applications","authors":"Maxime Ferrer, Bowen Deng, Javier E. Alfonso-Ramos and Thijs Stuyver","doi":"10.1039/D5DD00340G","DOIUrl":"https://doi.org/10.1039/D5DD00340G","url":null,"abstract":"<p >Plastics are essential in modern society, but their susceptibility to damage limits their lifespan and performance, and results in unsustainable waste production. Self-healing polymers based on thermally reversible Diels–Alder (DA) reactions offer a potential solution by enabling heating controlled repair through bond-breaking and reformation. However, discovering new suitable DA monomer combinations has largely relied on intuition and trial-and-error so far. Here, we present a hierarchical workflow that integrates machine learning (ML) with automated reaction profile calculations to efficiently screen DA reactions for self-healing polymer applications. Using our in-house TS-tools software, we generate high-throughput profiles at the semi-empirical <em>x</em>TB level. Refining only a small fraction with DFT, we are able to train a robust ML model that predicts reaction characteristics with excellent accuracy. Adding a graph-based ML model to the workflow for pre-screening enables expansion to reaction spaces of hundreds of thousands of reactions, at a marginal cost. We first leverage our models to screen a comprehensive reaction space of synthetic diene–dienophile pairs, and subsequently use them to mine a database of commercially available natural products. Overall, this hybrid ML-computational chemistry approach enables data-efficient discovery of thermally responsive DA reactions, advancing the rational design of self-healing polymers with tunable properties.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 3400-3410"},"PeriodicalIF":6.2,"publicationDate":"2025-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00340g?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145442800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Machine learning has experienced a drastic rise in interest and applications in all fields of chemistry, enabling researchers to leverage large chemical datasets to gain novel insights. The success of machine learning-driven projects in chemistry hinges on three key factors: access to robust and comprehensive datasets, a well-defined objective, and effective molecular representations that convert chemical structures into machine-readable formats. Transition metal complexes have lagged behind their organic counterparts on all three of these avenues. The large diversity of structures, coordination numbers and modes have made its translation to a machine-readable format an ongoing challenge. Here we introduce ELECTRUM, an electron configuration-based universal metal fingerprint for transition metal compounds. Its lightweight implementation enables the straightforward conversion of any transition metal complex into a simple fingerprint. Utilising a novel dataset generated from the Cambridge Structural Database (CSD), we demonstrate that ELECTRUM effectively captures the structural diversity of transition metal complexes. By plotting nearest-neighbor relationships in ELECTRUM space, we reveal meaningful clustering in two-dimensional representations. Furthermore, we use the ELECTRUM encoding to train machine learning models on the prediction of metal complex coordination numbers from ligand structures and metal identity alone. We show that on a subset of this data, we can train models to predict the oxidation state of metal complexes. These case studies showcase the potential of ELECTRUM as an easy-to-implement fingerprint for metal complexes. We rely on the community to further test, validate, and improve it.
{"title":"ELECTRUM: an electron configuration-based universal metal fingerprint for transition metal compounds","authors":"Markus Orsi and Angelo Frei","doi":"10.1039/D5DD00145E","DOIUrl":"10.1039/D5DD00145E","url":null,"abstract":"<p >Machine learning has experienced a drastic rise in interest and applications in all fields of chemistry, enabling researchers to leverage large chemical datasets to gain novel insights. The success of machine learning-driven projects in chemistry hinges on three key factors: access to robust and comprehensive datasets, a well-defined objective, and effective molecular representations that convert chemical structures into machine-readable formats. Transition metal complexes have lagged behind their organic counterparts on all three of these avenues. The large diversity of structures, coordination numbers and modes have made its translation to a machine-readable format an ongoing challenge. Here we introduce ELECTRUM, an electron configuration-based universal metal fingerprint for transition metal compounds. Its lightweight implementation enables the straightforward conversion of any transition metal complex into a simple fingerprint. Utilising a novel dataset generated from the Cambridge Structural Database (CSD), we demonstrate that ELECTRUM effectively captures the structural diversity of transition metal complexes. By plotting nearest-neighbor relationships in ELECTRUM space, we reveal meaningful clustering in two-dimensional representations. Furthermore, we use the ELECTRUM encoding to train machine learning models on the prediction of metal complex coordination numbers from ligand structures and metal identity alone. We show that on a subset of this data, we can train models to predict the oxidation state of metal complexes. These case studies showcase the potential of ELECTRUM as an easy-to-implement fingerprint for metal complexes. We rely on the community to further test, validate, and improve it.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 12","pages":" 3567-3577"},"PeriodicalIF":6.2,"publicationDate":"2025-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12548721/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145373288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Edy Mariano, Yannis Coderey, Yasmine El Goumi, Jasper Tan, Tanguy Cavagna, Jean-Charles Cousty, Vincenzo Scamarcio, Josie Hughes and Pascal Miéville
Laboratory automation is an active field in biology, drug discovery, and more recently in synthetic chemistry and materials science. Local automation has existed in the field for quite some time, but long-range or total laboratory automation is much less developed. In this article, we present a complete, open and decentralized global automation system called the 2D drone swarm system. It is based on a simple approach of small mobile robots moving autonomously in a dedicated track suspended above the scientific equipment for the long-distance sample and closely connected to localized robotic arms dedicated to short-distance transfers, interaction with scientific equipment and direct sample processing. This approach is inspired by the Kiva/Amazon model, where isolated autonomous mobile robots automatically deliver goods to external operators. It is also inspired by the modern automotive industry, such as Tesla's Gigafactories, to provide an evolutionary and flexible system that can adapt to numerous types of tasks with a minimum amount of resources and easily adapt to different types of workstations. This global automation system is controlled directly from the Laboratory Scheduler by a Robot Subscheduler, coded in an open-source environment, which takes care of all mobile and local robot operations. The result is an operator and scientific equipment safe, cost and energy-efficient, easily extensible and open-source global laboratory automation system that can be adapted to many different applications and laboratories.
{"title":"The 2D-drone swarm, a safe open-source sample transfer system for fully automated laboratories","authors":"Edy Mariano, Yannis Coderey, Yasmine El Goumi, Jasper Tan, Tanguy Cavagna, Jean-Charles Cousty, Vincenzo Scamarcio, Josie Hughes and Pascal Miéville","doi":"10.1039/D5DD00342C","DOIUrl":"https://doi.org/10.1039/D5DD00342C","url":null,"abstract":"<p >Laboratory automation is an active field in biology, drug discovery, and more recently in synthetic chemistry and materials science. Local automation has existed in the field for quite some time, but long-range or total laboratory automation is much less developed. In this article, we present a complete, open and decentralized global automation system called the 2D drone swarm system. It is based on a simple approach of small mobile robots moving autonomously in a dedicated track suspended above the scientific equipment for the long-distance sample and closely connected to localized robotic arms dedicated to short-distance transfers, interaction with scientific equipment and direct sample processing. This approach is inspired by the Kiva/Amazon model, where isolated autonomous mobile robots automatically deliver goods to external operators. It is also inspired by the modern automotive industry, such as Tesla's Gigafactories, to provide an evolutionary and flexible system that can adapt to numerous types of tasks with a minimum amount of resources and easily adapt to different types of workstations. This global automation system is controlled directly from the Laboratory Scheduler by a Robot Subscheduler, coded in an open-source environment, which takes care of all mobile and local robot operations. The result is an operator and scientific equipment safe, cost and energy-efficient, easily extensible and open-source global laboratory automation system that can be adapted to many different applications and laboratories.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 3162-3174"},"PeriodicalIF":6.2,"publicationDate":"2025-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00342c?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145442792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The thermal conductivity of covalent organic frameworks (COFs), an emerging class of nanoporous polymeric materials, is crucial for many applications, yet the link between their structure and thermal properties remains poorly understood. Analysis of a dataset containing over 2400 COFs reveals that conventional features such as density, pore size, void fraction, and surface area do not reliably predict thermal conductivity. To address this, an attention-based machine learning model was trained, accurately predicting thermal conductivities even for structures outside the training set. The attention mechanism was then utilized to investigate the model's success. The analysis identified dangling molecular branches as a key predictor of thermal conductivity, leading us to define the dangling mass ratio (DMR), a descriptor that quantifies the fraction of atomic mass in dangling branches relative to the total COF mass. Feature importance assessments on regression models confirm the significance of DMR in predicting thermal conductivity. These findings indicate that COFs with dangling functional groups exhibit lower thermal transfer capabilities. Molecular dynamics simulations support this observation, revealing significant mismatches in the vibrational density of states due to the presence of dangling branches.
{"title":"Deep learning reveals key predictors of thermal conductivity in covalent organic frameworks","authors":"Prakash Thakolkaran, Yiwen Zheng, Yaqi Guo, Aniruddh Vashisth and Siddhant Kumar","doi":"10.1039/D5DD00126A","DOIUrl":"https://doi.org/10.1039/D5DD00126A","url":null,"abstract":"<p >The thermal conductivity of covalent organic frameworks (COFs), an emerging class of nanoporous polymeric materials, is crucial for many applications, yet the link between their structure and thermal properties remains poorly understood. Analysis of a dataset containing over 2400 COFs reveals that conventional features such as density, pore size, void fraction, and surface area do not reliably predict thermal conductivity. To address this, an attention-based machine learning model was trained, accurately predicting thermal conductivities even for structures outside the training set. The attention mechanism was then utilized to investigate the model's success. The analysis identified dangling molecular branches as a key predictor of thermal conductivity, leading us to define the dangling mass ratio (DMR), a descriptor that quantifies the fraction of atomic mass in dangling branches relative to the total COF mass. Feature importance assessments on regression models confirm the significance of DMR in predicting thermal conductivity. These findings indicate that COFs with dangling functional groups exhibit lower thermal transfer capabilities. Molecular dynamics simulations support this observation, revealing significant mismatches in the vibrational density of states due to the presence of dangling branches.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 3351-3362"},"PeriodicalIF":6.2,"publicationDate":"2025-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00126a?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145442797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Correction for ‘Enhancing multifunctional drug screening via artificial intelligence’ by Junlin Dong et al., Digital Discovery, 2025, 4, 2012–2024, https://doi.org/10.1039/D5DD00082C.
Strategies to improve the predicting performance of Message-Passing Neural-Networks for molecular property predictions can be achieved by simplifying how the message is passed and by using descriptors that capture multiple aspects of molecular graphs. In this work, we designed model architectures that achieved state-of-the-art performance, surpassing more complex models such as those pre-trained on external databases. We assessed dataset diversity to complement our performance results, finding that structural diversity influences the need for additional components in our MPNNs and feature sets. In most datasets, our best architecture employs bidirectional message-passing with an attention mechanism, applied to a minimalist message formulation that excludes self-perception, highlighting that relatively simpler models, compared to classical MPNNs, yield higher class separability. In contrast, we found that convolution normalization factors do not benefit the predictive power in all the datasets tested. This was corroborated in both global and node-level outputs. Additionally, we analyzed the influence of both adding spatial features and working with 3D graphs, finding that 2D molecular graphs are sufficient when complemented with appropriately chosen 3D descriptors. This approach not only preserves predictive performance but also reduces computational cost by over 50%, making it particularly advantageous for high-throughput screening campaigns.
{"title":"Optimal message passing for molecular prediction is simple, attentive and spatial","authors":"Alma C. Castañeda-Leautaud and Rommie E. Amaro","doi":"10.1039/D5DD00193E","DOIUrl":"https://doi.org/10.1039/D5DD00193E","url":null,"abstract":"<p >Strategies to improve the predicting performance of Message-Passing Neural-Networks for molecular property predictions can be achieved by simplifying how the message is passed and by using descriptors that capture multiple aspects of molecular graphs. In this work, we designed model architectures that achieved state-of-the-art performance, surpassing more complex models such as those pre-trained on external databases. We assessed dataset diversity to complement our performance results, finding that structural diversity influences the need for additional components in our MPNNs and feature sets. In most datasets, our best architecture employs bidirectional message-passing with an attention mechanism, applied to a minimalist message formulation that excludes self-perception, highlighting that relatively simpler models, compared to classical MPNNs, yield higher class separability. In contrast, we found that convolution normalization factors do not benefit the predictive power in all the datasets tested. This was corroborated in both global and node-level outputs. Additionally, we analyzed the influence of both adding spatial features and working with 3D graphs, finding that 2D molecular graphs are sufficient when complemented with appropriately chosen 3D descriptors. This approach not only preserves predictive performance but also reduces computational cost by over 50%, making it particularly advantageous for high-throughput screening campaigns.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 3320-3338"},"PeriodicalIF":6.2,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00193e?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145442773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}