Aikaterini Vriza, Uma Kornu, Aditya Koneru, Henry Chan and Subramanian K. R. S. Sankaranarayanan
One of the main bottlenecks for the wide adoption of atomistic simulation pipelines for computational materials design is the high complexity of the workflows which many times requires the use of a diverse set of specialized toolkits and libraries. Here, we introduce a multi-agent artificial intelligence (AI) framework that autonomously performs end-to-end atomistic simulations, i.e. molecular dynamics (MD), with automated input and associated full suite of analyses, using large language models (LLMs) and multiple specialized AI agents. Our system orchestrates the entire simulation pipeline, from structure generation via Atomsk and interatomic potential discovery through automated web mining, to simulation setup and execution using LAMMPS on high-performance computing (HPC) platforms. Post-simulation, our agentic framework performs automated data analysis and visualization with popular analysis tools like OVITO and Phonopy. Each expert agent operates within a defined role, equipped with domain-specific functions and a shared memory context for coordination. Using a diverse set of representative elemental and alloy systems, we demonstrate the capability of our framework to execute a range of static and dynamic materials modeling tasks, including lattice parameter and cohesive energy estimation, elastic constants computation, phonon dispersion analysis, as well as perform MD simulations to determine dynamical properties that aid estimation of melting point. The results produced by the agents show strong agreement with those obtained by a human expert, highlighting the reliability of the agentic approach. By combining automation, reproducibility, and human-in-the-loop control, our framework lowers the barrier to the widespread adoption of scalable, AI-driven discovery tools in materials science.
{"title":"Multi-agentic AI framework for end-to-end atomistic simulations","authors":"Aikaterini Vriza, Uma Kornu, Aditya Koneru, Henry Chan and Subramanian K. R. S. Sankaranarayanan","doi":"10.1039/D5DD00435G","DOIUrl":"https://doi.org/10.1039/D5DD00435G","url":null,"abstract":"<p >One of the main bottlenecks for the wide adoption of atomistic simulation pipelines for computational materials design is the high complexity of the workflows which many times requires the use of a diverse set of specialized toolkits and libraries. Here, we introduce a multi-agent artificial intelligence (AI) framework that autonomously performs end-to-end atomistic simulations, <em>i.e.</em> molecular dynamics (MD), with automated input and associated full suite of analyses, using large language models (LLMs) and multiple specialized AI agents. Our system orchestrates the entire simulation pipeline, from structure generation <em>via</em> Atomsk and interatomic potential discovery through automated web mining, to simulation setup and execution using LAMMPS on high-performance computing (HPC) platforms. Post-simulation, our agentic framework performs automated data analysis and visualization with popular analysis tools like OVITO and Phonopy. Each expert agent operates within a defined role, equipped with domain-specific functions and a shared memory context for coordination. Using a diverse set of representative elemental and alloy systems, we demonstrate the capability of our framework to execute a range of static and dynamic materials modeling tasks, including lattice parameter and cohesive energy estimation, elastic constants computation, phonon dispersion analysis, as well as perform MD simulations to determine dynamical properties that aid estimation of melting point. The results produced by the agents show strong agreement with those obtained by a human expert, highlighting the reliability of the agentic approach. By combining automation, reproducibility, and human-in-the-loop control, our framework lowers the barrier to the widespread adoption of scalable, AI-driven discovery tools in materials science.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 440-452"},"PeriodicalIF":6.2,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00435g?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Artem Mishchenko, Anupam Bhattacharya, Xiangwen Wang, Henry Kelbrick Pentz, Yihao Wei and Qian Yang
This review explores the impact of deep learning (DL) techniques on understanding and predicting electronic structures in two-dimensional (2D) materials. We highlight unique computational challenges posed by 2D materials and discuss how DL approaches – such as physics-aware models, generative AI, and inverse design – have significantly improved predictions of critical electronic properties, including band structures, density of states, and quantum transport phenomena. Through selected case studies, we illustrate how DL methods accelerate discoveries in emergent quantum phenomena, topology, superconductivity, and autonomous materials exploration. Finally, we outline promising future directions, stressing the need for robust data standardization and advocating for integrated frameworks that combine theoretical modeling, DL methods, and experimental validations.
{"title":"Deep learning methods for 2D material electronic properties","authors":"Artem Mishchenko, Anupam Bhattacharya, Xiangwen Wang, Henry Kelbrick Pentz, Yihao Wei and Qian Yang","doi":"10.1039/D5DD00155B","DOIUrl":"10.1039/D5DD00155B","url":null,"abstract":"<p >This review explores the impact of deep learning (DL) techniques on understanding and predicting electronic structures in two-dimensional (2D) materials. We highlight unique computational challenges posed by 2D materials and discuss how DL approaches – such as physics-aware models, generative AI, and inverse design – have significantly improved predictions of critical electronic properties, including band structures, density of states, and quantum transport phenomena. Through selected case studies, we illustrate how DL methods accelerate discoveries in emergent quantum phenomena, topology, superconductivity, and autonomous materials exploration. Finally, we outline promising future directions, stressing the need for robust data standardization and advocating for integrated frameworks that combine theoretical modeling, DL methods, and experimental validations.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 28-63"},"PeriodicalIF":6.2,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12720248/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145822210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohamadreza Ramezani, Poulomi Nandi, Pablo Antonio De La Fuente-Moreno and Majid Beidaghi
The discovery of next-generation battery electrolytes increasingly involves complex, multicomponent formulations that demand high-throughput, systematic exploration. We present the Bayesian Robotic Investigator of Novel Electrolytes (BRINE), a cost-effective, self-driving laboratory (SDL) that autonomously prepares and tests mixed electrolyte solutions. BRINE combines an open-source liquid-handling robot with a potentiostat and custom-made electrodes to mix reagents and perform electrochemical measurements without human intervention. A Bayesian optimization routine navigates multidimensional composition spaces, allowing the platform to rapidly identify promising formulations. As a proof of concept, BRINE mapped ionic conductivity in two aqueous electrolyte spaces (i) aqueous mixtures of NaCl, KCl, MgCl2, and CaCl2, and (ii) battery-oriented mixtures containing ZnCl2, KCl, NH4Cl, NaCl, and EMIMCl, testing ≈230 unique compositions in under 20 hours and finding conductivities up to 32.13 S m−1. These results demonstrate how closed-loop autonomous experimentation and optimization accelerate the identification of electrolytes with the highest conductivity across a large multicomponent composition space, while minimizing experimental variability. This work lays the foundation for broader electrochemical studies using the BRINE platform.
下一代电池电解质的发现越来越多地涉及到复杂的、多组分的配方,这需要高通量、系统的探索。我们介绍了新型电解质的贝叶斯机器人调查员(BRINE),这是一个具有成本效益的自动驾驶实验室(SDL),可以自主制备和测试混合电解质溶液。BRINE将开源液体处理机器人与恒电位器和定制电极结合在一起,混合试剂并进行电化学测量,无需人工干预。贝叶斯优化程序导航多维组合空间,允许平台快速识别有前途的配方。作为概念验证,BRINE绘制了两个水溶液电解质空间(i) NaCl、KCl、MgCl2和CaCl2的水溶液混合物,以及(ii)含有ZnCl2、KCl、NH4Cl、NaCl和EMIMCl的电池取向混合物中的离子电导率,在20小时内测试了约230种独特的成分,发现电导率高达32.13 S m−1。这些结果证明了闭环自主实验和优化如何加速在大的多组分组成空间中识别具有最高电导率的电解质,同时最大限度地减少实验变化。这项工作为使用BRINE平台进行更广泛的电化学研究奠定了基础。
{"title":"BRINE: a cost-effective electrochemical self-driving laboratory for accelerated discovery of high-performance electrolytes","authors":"Mohamadreza Ramezani, Poulomi Nandi, Pablo Antonio De La Fuente-Moreno and Majid Beidaghi","doi":"10.1039/D5DD00353A","DOIUrl":"https://doi.org/10.1039/D5DD00353A","url":null,"abstract":"<p >The discovery of next-generation battery electrolytes increasingly involves complex, multicomponent formulations that demand high-throughput, systematic exploration. We present the Bayesian Robotic Investigator of Novel Electrolytes (BRINE), a cost-effective, self-driving laboratory (SDL) that autonomously prepares and tests mixed electrolyte solutions. BRINE combines an open-source liquid-handling robot with a potentiostat and custom-made electrodes to mix reagents and perform electrochemical measurements without human intervention. A Bayesian optimization routine navigates multidimensional composition spaces, allowing the platform to rapidly identify promising formulations. As a proof of concept, BRINE mapped ionic conductivity in two aqueous electrolyte spaces (i) aqueous mixtures of NaCl, KCl, MgCl<small><sub>2</sub></small>, and CaCl<small><sub>2</sub></small>, and (ii) battery-oriented mixtures containing ZnCl<small><sub>2</sub></small>, KCl, NH<small><sub>4</sub></small>Cl, NaCl, and EMIMCl, testing ≈230 unique compositions in under 20 hours and finding conductivities up to 32.13 S m<small><sup>−1</sup></small>. These results demonstrate how closed-loop autonomous experimentation and optimization accelerate the identification of electrolytes with the highest conductivity across a large multicomponent composition space, while minimizing experimental variability. This work lays the foundation for broader electrochemical studies using the BRINE platform.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 397-406"},"PeriodicalIF":6.2,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00353a?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Machine Learning Interatomic Potentials (MLIPs) are a promising alternative to expensive ab initio quantum mechanical molecular simulations. Given the diversity of chemical spaces that are of interest and the cost of generating new data, it is important to understand how universal MLIPs generalize beyond their training distributions. In order to characterize and better understand distribution shifts in MLIPs—that is, changes between the training and testing distributions—we conduct diagnostic experiments on chemical datasets, revealing common shifts that pose significant challenges, even for large universal models trained on extensive data. Based on these observations, we hypothesize that current supervised training methods inadequately regularize MLIPs, resulting in overfitting and learning poor representations of out-of-distribution systems. We then propose two new methods as initial steps for mitigating distribution shifts for MLIPs. Our methods focus on test-time refinement strategies that incur minimal computational cost and do not use expensive ab initio reference labels. The first strategy, based on spectral graph theory, modifies the edges of test graphs to align with graph structures seen during training. Our second strategy improves representations for out-of-distribution systems at test-time by taking gradient steps using an auxiliary objective, such as a cheap physical prior. Our test-time refinement strategies significantly reduce errors on out-of-distribution systems, suggesting that MLIPs are capable of and can move towards modeling diverse chemical spaces, but are not being effectively trained to do so. Our experiments establish clear benchmarks for evaluating the generalization capabilities of the next generation of MLIPs. Our code is available at https://tkreiman.github.io/projects/mlff_distribution_shifts/.
{"title":"Understanding and mitigating distribution shifts for universal machine learning interatomic potentials","authors":"Tobias Kreiman and Aditi S. Krishnapriyan","doi":"10.1039/D5DD00260E","DOIUrl":"https://doi.org/10.1039/D5DD00260E","url":null,"abstract":"<p >Machine Learning Interatomic Potentials (MLIPs) are a promising alternative to expensive <em>ab initio</em> quantum mechanical molecular simulations. Given the diversity of chemical spaces that are of interest and the cost of generating new data, it is important to understand how universal MLIPs generalize beyond their training distributions. In order to characterize and better understand distribution shifts in MLIPs—that is, changes between the training and testing distributions—we conduct diagnostic experiments on chemical datasets, revealing common shifts that pose significant challenges, even for large universal models trained on extensive data. Based on these observations, we hypothesize that current supervised training methods inadequately regularize MLIPs, resulting in overfitting and learning poor representations of out-of-distribution systems. We then propose two new methods as initial steps for mitigating distribution shifts for MLIPs. Our methods focus on test-time refinement strategies that incur minimal computational cost and do not use expensive <em>ab initio</em> reference labels. The first strategy, based on spectral graph theory, modifies the edges of test graphs to align with graph structures seen during training. Our second strategy improves representations for out-of-distribution systems at test-time by taking gradient steps using an auxiliary objective, such as a cheap physical prior. Our test-time refinement strategies significantly reduce errors on out-of-distribution systems, suggesting that MLIPs are capable of and can move towards modeling diverse chemical spaces, but are not being effectively trained to do so. Our experiments establish clear benchmarks for evaluating the generalization capabilities of the next generation of MLIPs. Our code is available at https://tkreiman.github.io/projects/mlff_distribution_shifts/.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 415-439"},"PeriodicalIF":6.2,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00260e?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thomas Van Hout, Oliver Loveday, Jordi Morales-Vidal, Santiago Morandi and Núria López
The estimation of the strength of the bond of adsorbates on the surface is key to the design of novel materials for heterogeneous catalysis. Machine learning (ML) methodologies have proven effective in rapidly and accurately evaluating adsorption energies on transition metal surfaces. However, the complexity of metal oxides and their diverse adsorbate–catalyst interactions hinder the sound transfer of ML approaches to these catalytically relevant materials. To address this challenge, we have evaluated the transferability of GAME-Net, a graph neural network developed for transition metals, by following an approach of increasing complexity, leading to GAME-Net-Ox. A density functional theory dataset was built with organic molecules on conductive (IrO2 and RuO2) and semiconductive (TiO2) rutile oxides to evaluate GAME-Net's transferability. While the original GAME-Net failed to directly generalize between metals and metal oxides, GAME-Net-Ox trained exclusively on oxides achieved high accuracy (MAE = 0.16 eV) and both families of materials can be treated in GAME-Net-Ox with the same accuracy (MAE = 0.16 eV). This work demonstrates the adaptability of the GAME-Net architecture, enabling the screening of adsorbates on metal oxides, materials with complex electronic properties.
{"title":"Evaluating the transfer learning from metals to oxides with GAME-Net-Ox","authors":"Thomas Van Hout, Oliver Loveday, Jordi Morales-Vidal, Santiago Morandi and Núria López","doi":"10.1039/D5DD00331H","DOIUrl":"https://doi.org/10.1039/D5DD00331H","url":null,"abstract":"<p >The estimation of the strength of the bond of adsorbates on the surface is key to the design of novel materials for heterogeneous catalysis. Machine learning (ML) methodologies have proven effective in rapidly and accurately evaluating adsorption energies on transition metal surfaces. However, the complexity of metal oxides and their diverse adsorbate–catalyst interactions hinder the sound transfer of ML approaches to these catalytically relevant materials. To address this challenge, we have evaluated the transferability of GAME-Net, a graph neural network developed for transition metals, by following an approach of increasing complexity, leading to GAME-Net-Ox. A density functional theory dataset was built with organic molecules on conductive (IrO<small><sub>2</sub></small> and RuO<small><sub>2</sub></small>) and semiconductive (TiO<small><sub>2</sub></small>) rutile oxides to evaluate GAME-Net's transferability. While the original GAME-Net failed to directly generalize between metals and metal oxides, GAME-Net-Ox trained exclusively on oxides achieved high accuracy (MAE = 0.16 eV) and both families of materials can be treated in GAME-Net-Ox with the same accuracy (MAE = 0.16 eV). This work demonstrates the adaptability of the GAME-Net architecture, enabling the screening of adsorbates on metal oxides, materials with complex electronic properties.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 407-414"},"PeriodicalIF":6.2,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00331h?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aditya Ranganath, Hyojin Kim, Heesung Shim and Jonathan E. Allen
Machine learning models are often used as scoring functions to predict the binding affinity of a protein–ligand complex. These models are trained with limited amounts of data with experimentally measured binding affinity values. A large number of compounds are labeled inactive through single-concentration screens without measuring binding affinities. These inactive compounds, along with the active ones, can be used to train binary classification models, while regression models are trained using compounds with binding affinities only. However, the classification and regression tasks are often handled separately, without sharing the learned feature representations. In this paper, we propose a novel model architecture that jointly performs regression and classification objectives, aiming to maximize data utilization and improve predictive performance by leveraging two complementary tasks. In our setup, the regression yields the binding affinity, whereas the classification task yields the label as active or inactive. We demonstrate our method using PDBbind, the standard 3D structure database, as well as a dataset of flavivirus protease compounds with binding affinity data. Our experiments show that the new joint training strategy improves the accuracy of the model, increasing applicability in various practical drug screening scenarios.
{"title":"SLAB: simultaneous labeling and binding affinity prediction for protein–ligand structures","authors":"Aditya Ranganath, Hyojin Kim, Heesung Shim and Jonathan E. Allen","doi":"10.1039/D5DD00248F","DOIUrl":"https://doi.org/10.1039/D5DD00248F","url":null,"abstract":"<p >Machine learning models are often used as scoring functions to predict the binding affinity of a protein–ligand complex. These models are trained with limited amounts of data with experimentally measured binding affinity values. A large number of compounds are labeled inactive through single-concentration screens without measuring binding affinities. These inactive compounds, along with the active ones, can be used to train binary classification models, while regression models are trained using compounds with binding affinities only. However, the classification and regression tasks are often handled separately, without sharing the learned feature representations. In this paper, we propose a novel model architecture that jointly performs regression and classification objectives, aiming to maximize data utilization and improve predictive performance by leveraging two complementary tasks. In our setup, the regression yields the binding affinity, whereas the classification task yields the label as active or inactive. We demonstrate our method using PDBbind, the standard 3D structure database, as well as a dataset of flavivirus protease compounds with binding affinity data. Our experiments show that the new joint training strategy improves the accuracy of the model, increasing applicability in various practical drug screening scenarios.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 375-383"},"PeriodicalIF":6.2,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00248f?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Philipp Schleich, Lasse Bjørn Kristensen, Jorge A. Campos-Gonzalez-Angulo, Abdulrahman Aldossary, Davide Avagliano, Mohsen Bagherimehrab, Christoph Gorgulla, Joe Fitzsimons and Alán Aspuru-Guzik
Simulating chemical systems is highly sought after and computationally challenging, as the number of degrees of freedom increases exponentially with the size of the system. Quantum computers have been proposed as a computational means to overcome this bottleneck, thanks to their capability of representing this amount of information efficiently. Most efforts so far have been centered around determining the ground states of chemical systems. However, hardness results and the lack of theoretical guarantees for efficient heuristics for initial-state generation shed doubt on the feasibility. Here, we propose a heuristically guided approach that is based on inherently efficient routines to solve chemical simulation problems, requiring quantum circuits of size scaling polynomially in relevant system parameters. If a set of assumptions can be satisfied, our approach finds good initial states for dynamics simulation by assembling them in a scattering tree. In particular, we investigate a scattering-based state preparation approach within the context of mergo-association. We discuss a variety of quantities of chemical interest that can be measured after the quantum simulation of a process, e.g., a reaction, following its corresponding initial state preparation.
{"title":"Chemically motivated simulation problems are efficiently solvable on a quantum computer","authors":"Philipp Schleich, Lasse Bjørn Kristensen, Jorge A. Campos-Gonzalez-Angulo, Abdulrahman Aldossary, Davide Avagliano, Mohsen Bagherimehrab, Christoph Gorgulla, Joe Fitzsimons and Alán Aspuru-Guzik","doi":"10.1039/D5DD00377F","DOIUrl":"https://doi.org/10.1039/D5DD00377F","url":null,"abstract":"<p >Simulating chemical systems is highly sought after and computationally challenging, as the number of degrees of freedom increases exponentially with the size of the system. Quantum computers have been proposed as a computational means to overcome this bottleneck, thanks to their capability of representing this amount of information efficiently. Most efforts so far have been centered around determining the ground states of chemical systems. However, hardness results and the lack of theoretical guarantees for efficient heuristics for initial-state generation shed doubt on the feasibility. Here, we propose a heuristically guided approach that is based on inherently efficient routines to solve chemical simulation problems, requiring quantum circuits of size scaling polynomially in relevant system parameters. If a set of assumptions can be satisfied, our approach finds good initial states for dynamics simulation by assembling them in a scattering tree. In particular, we investigate a scattering-based state preparation approach within the context of mergo-association. We discuss a variety of quantities of chemical interest that can be measured after the quantum simulation of a process, <em>e.g.</em>, a reaction, following its corresponding initial state preparation.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 64-87"},"PeriodicalIF":6.2,"publicationDate":"2025-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00377f?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sitanan Sartyoungkul, Balasubramaniyan Sakthivel, Pavel Sidorov and Yuuya Nagata
The integration of automated synthesis and machine learning (ML) is transforming analytical chemistry by enabling data-driven approaches to method development. Chromatographic column selection, a critical yet time-consuming step in separation science, stands to benefit substantially from such advances. Here, we report a workflow that combines automated synthesis of a structurally diverse amide library with fragment descriptor-based ML for retention time prediction in supercritical fluid chromatography (SFC). Retention data were systematically acquired on the recently developed DCpak® PBT column, providing one of the first structured datasets for this stationary phase. Benchmarking revealed that fragment-count descriptors (ChyLine and CircuS) substantially outperformed conventional molecular fingerprints, delivering higher predictive accuracy and more interpretable relationships between substructures and retention behavior. External validation underscored the role of chemical space coverage, while visualization techniques such as ColorAtom analysis offered mechanistic insight into model decisions. By uniting automated synthesis with chemoinformatics-driven ML, this study demonstrates a scalable approach to generating high-quality training data and predictive models for chromatography. Beyond retention prediction, the framework exemplifies how data-centric strategies can accelerate column characterization, reduce reliance on trial-and-error experimentation, and advance the development of autonomous, high-throughput analytical workflows.
{"title":"Automated synthesis and fragment descriptor-based machine learning for retention time prediction in supercritical fluid chromatography","authors":"Sitanan Sartyoungkul, Balasubramaniyan Sakthivel, Pavel Sidorov and Yuuya Nagata","doi":"10.1039/D5DD00437C","DOIUrl":"https://doi.org/10.1039/D5DD00437C","url":null,"abstract":"<p >The integration of automated synthesis and machine learning (ML) is transforming analytical chemistry by enabling data-driven approaches to method development. Chromatographic column selection, a critical yet time-consuming step in separation science, stands to benefit substantially from such advances. Here, we report a workflow that combines automated synthesis of a structurally diverse amide library with fragment descriptor-based ML for retention time prediction in supercritical fluid chromatography (SFC). Retention data were systematically acquired on the recently developed DCpak® PBT column, providing one of the first structured datasets for this stationary phase. Benchmarking revealed that fragment-count descriptors (ChyLine and CircuS) substantially outperformed conventional molecular fingerprints, delivering higher predictive accuracy and more interpretable relationships between substructures and retention behavior. External validation underscored the role of chemical space coverage, while visualization techniques such as ColorAtom analysis offered mechanistic insight into model decisions. By uniting automated synthesis with chemoinformatics-driven ML, this study demonstrates a scalable approach to generating high-quality training data and predictive models for chromatography. Beyond retention prediction, the framework exemplifies how data-centric strategies can accelerate column characterization, reduce reliance on trial-and-error experimentation, and advance the development of autonomous, high-throughput analytical workflows.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 310-316"},"PeriodicalIF":6.2,"publicationDate":"2025-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00437c?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146007005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ankur K. Gupta, Caitlin V. Hetherington and Wibe A. de Jong
The separation of rare-earth metals, vital for numerous advanced technologies, is hampered by their similar chemical properties, making ligand discovery a significant challenge. Traditional experimental and quantum chemistry approaches for identifying effective ligands are often resource-intensive. We introduce a machine learning protocol based on an equivariant neural network, Allegro, for the rapid and accurate prediction of binding energies in rare-earth complexes. Key to this work is our newly curated dataset of rare-earth metal complexes—made publicly available to foster further research—systematically generated using the Architector program. This dataset distinctively features functionalized derivatives of proven rare-earth-chelating scaffolds, hydroxypyridinone (HOPO), catecholamide (CAM), and their thio-analogues, selected for their established efficacy in binding these elements. Trained on this valuable resource, our Allegro models demonstrate excellent performance, particularly when trained to directly predict DFT-level binding energies, yielding highly accurate results that closely correlate with theoretical calculations on a diverse test set. Furthermore, this strategy exhibited strong out-of-sample generalization, accurately predicting binding energies for an isomeric HOPO-derivative ligand not seen during training. By substantially reducing computational demands, this machine learning framework, alongside the provided dataset, represent powerful tools to accelerate the high-throughput screening and rational design of novel ligands for efficient rare-earth metal separation.
{"title":"Toward accelerating rare-earth metal extraction using equivariant neural networks","authors":"Ankur K. Gupta, Caitlin V. Hetherington and Wibe A. de Jong","doi":"10.1039/D5DD00286A","DOIUrl":"https://doi.org/10.1039/D5DD00286A","url":null,"abstract":"<p >The separation of rare-earth metals, vital for numerous advanced technologies, is hampered by their similar chemical properties, making ligand discovery a significant challenge. Traditional experimental and quantum chemistry approaches for identifying effective ligands are often resource-intensive. We introduce a machine learning protocol based on an equivariant neural network, Allegro, for the rapid and accurate prediction of binding energies in rare-earth complexes. Key to this work is our newly curated dataset of rare-earth metal complexes—made publicly available to foster further research—systematically generated using the <em>Architector</em> program. This dataset distinctively features functionalized derivatives of proven rare-earth-chelating scaffolds, hydroxypyridinone (HOPO), catecholamide (CAM), and their thio-analogues, selected for their established efficacy in binding these elements. Trained on this valuable resource, our Allegro models demonstrate excellent performance, particularly when trained to directly predict DFT-level binding energies, yielding highly accurate results that closely correlate with theoretical calculations on a diverse test set. Furthermore, this strategy exhibited strong out-of-sample generalization, accurately predicting binding energies for an isomeric HOPO-derivative ligand not seen during training. By substantially reducing computational demands, this machine learning framework, alongside the provided dataset, represent powerful tools to accelerate the high-throughput screening and rational design of novel ligands for efficient rare-earth metal separation.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 363-374"},"PeriodicalIF":6.2,"publicationDate":"2025-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00286a?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Salvatore Sorrentino, Alessandro Gussoni, Francesco Calcagno, Gioele Pasotti, Davide Avagliano, Ivan Rivalta, Marco Garavelli and Dario Polli
Raman spectroscopy is a powerful technique for probing molecular vibrations, yet the computational prediction of Raman spectra remains challenging due to the high cost of quantum chemical methods and the complexity of structure–spectrum relationships. Here, we introduce Mol2Raman, a deep-learning framework that predicts spontaneous Raman spectra directly from SMILES representations of molecules. The model leverages Graph Isomorphism Networks with edge features (GINE) to encode molecular topology and bond characteristics, enabling accurate prediction of both peak positions and intensities across diverse chemical structures. Trained on a novel dataset of over 31 000 molecules with state-of-the-art Density Functional Theory (DFT)-calculated Raman spectra, Mol2Raman outperforms both fingerprint-based similarity models and Chemprop-based neural networks. It achieves a high fidelity in reproducing spectral features, including for molecules with low structural similarity to the training set and for enantiomeric inversion. The model offers fast inference times (22 ms per molecule), making it suitable for high-throughput molecular screening. We further deploy Mol2Raman as an open-access web application, enabling real-time predictions without specialized hardware. This work establishes a scalable, accurate, and interpretable platform for Raman spectral prediction, opening new opportunities in molecular design, materials discovery, and spectroscopic diagnostics.
{"title":"Mol2Raman: a graph neural network model for predicting Raman spectra from SMILES representations","authors":"Salvatore Sorrentino, Alessandro Gussoni, Francesco Calcagno, Gioele Pasotti, Davide Avagliano, Ivan Rivalta, Marco Garavelli and Dario Polli","doi":"10.1039/D5DD00210A","DOIUrl":"10.1039/D5DD00210A","url":null,"abstract":"<p >Raman spectroscopy is a powerful technique for probing molecular vibrations, yet the computational prediction of Raman spectra remains challenging due to the high cost of quantum chemical methods and the complexity of structure–spectrum relationships. Here, we introduce Mol2Raman, a deep-learning framework that predicts spontaneous Raman spectra directly from SMILES representations of molecules. The model leverages Graph Isomorphism Networks with edge features (GINE) to encode molecular topology and bond characteristics, enabling accurate prediction of both peak positions and intensities across diverse chemical structures. Trained on a novel dataset of over 31 000 molecules with state-of-the-art Density Functional Theory (DFT)-calculated Raman spectra, Mol2Raman outperforms both fingerprint-based similarity models and Chemprop-based neural networks. It achieves a high fidelity in reproducing spectral features, including for molecules with low structural similarity to the training set and for enantiomeric inversion. The model offers fast inference times (22 ms per molecule), making it suitable for high-throughput molecular screening. We further deploy Mol2Raman as an open-access web application, enabling real-time predictions without specialized hardware. This work establishes a scalable, accurate, and interpretable platform for Raman spectral prediction, opening new opportunities in molecular design, materials discovery, and spectroscopic diagnostics.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 161-176"},"PeriodicalIF":6.2,"publicationDate":"2025-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12691243/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145745937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}