Shendong Tan, Bochun Liang, Dexin Lu, Chaoyuan Ji, Wenke Ji, Zihui Li and Tingzheng Hou
Solid polymer electrolytes exhibit limitations in room-temperature ionic conductivity and electrochemical stability. While molecular simulations and electronic-structure theory are able to sample these key properties at the molecular scale, the field currently lacks integrated, automated tools for end-to-end assessment. We introduce polymer electrolyte modeling and discovery (PEMD), an open-source Python framework that unifies polymer construction, force field parameterization, multiscale simulation, and property analysis for polymer electrolytes. The comprehensive analysis suite spans transport properties, transport mechanisms, and electrochemical stability. PEMD achieves a 100% success rate in constructing a collection of 656 homopolymers. The automated molecular dynamics workflow reproduces experimental ionic conductivities for 18 reported systems (Spearman ρ = 0.819; MAE = 0.684 in log 10 (S cm−1)). Specifically, for poly(ethylene oxide)/LiTFSI electrolytes, PEMD captures the canonical non-monotonic dependence of ionic conductivity on salt concentration with built-in default settings. The workflow is further applied at scale to compute ionic conductivities for 200 polymer electrolytes. Moreover, automated oxidation window screening on 15 representative polymer electrolytes recovers experimental rankings for the oxidation potential (Spearman ρ = 0.754; MAE = 0.473 V). With standardized protocols and traceable workflows, PEMD provides a reliable platform for high-throughput screening and data-driven design of solid polymer electrolytes.
固体聚合物电解质在室温离子电导率和电化学稳定性方面表现出局限性。虽然分子模拟和电子结构理论能够在分子尺度上对这些关键特性进行采样,但该领域目前缺乏集成的、自动化的端到端评估工具。我们介绍了聚合物电解质建模和发现(PEMD),这是一个开源的Python框架,它统一了聚合物构建,力场参数化,多尺度模拟和聚合物电解质的性质分析。综合分析套件涵盖传输特性,传输机制和电化学稳定性。在构建656个均聚物的过程中,PEMD实现了100%的成功率。自动化分子动力学工作流再现了18个已报道系统的实验离子电导率(Spearman ρ = 0.819; MAE = 0.684, log 10 (S cm−1))。具体来说,对于聚(环氧乙烷)/LiTFSI电解质,PEMD通过内置默认设置捕获离子电导率与盐浓度的典型非单调依赖关系。该工作流程进一步应用于计算200种聚合物电解质的离子电导率。此外,对15种代表性聚合物电解质的自动氧化窗口筛选恢复了氧化电位的实验排名(Spearman ρ = 0.754; MAE = 0.473 V)。通过标准化的方案和可追溯的工作流程,PEMD为固体聚合物电解质的高通量筛选和数据驱动设计提供了可靠的平台。
{"title":"PEMD: a high-throughput simulation and analysis framework for solid polymer electrolytes","authors":"Shendong Tan, Bochun Liang, Dexin Lu, Chaoyuan Ji, Wenke Ji, Zihui Li and Tingzheng Hou","doi":"10.1039/D5DD00454C","DOIUrl":"https://doi.org/10.1039/D5DD00454C","url":null,"abstract":"<p >Solid polymer electrolytes exhibit limitations in room-temperature ionic conductivity and electrochemical stability. While molecular simulations and electronic-structure theory are able to sample these key properties at the molecular scale, the field currently lacks integrated, automated tools for end-to-end assessment. We introduce polymer electrolyte modeling and discovery (PEMD), an open-source Python framework that unifies polymer construction, force field parameterization, multiscale simulation, and property analysis for polymer electrolytes. The comprehensive analysis suite spans transport properties, transport mechanisms, and electrochemical stability. PEMD achieves a 100% success rate in constructing a collection of 656 homopolymers. The automated molecular dynamics workflow reproduces experimental ionic conductivities for 18 reported systems (Spearman <em>ρ</em> = 0.819; MAE = 0.684 in log 10 (S cm<small><sup>−1</sup></small>)). Specifically, for poly(ethylene oxide)/LiTFSI electrolytes, PEMD captures the canonical non-monotonic dependence of ionic conductivity on salt concentration with built-in default settings. The workflow is further applied at scale to compute ionic conductivities for 200 polymer electrolytes. Moreover, automated oxidation window screening on 15 representative polymer electrolytes recovers experimental rankings for the oxidation potential (Spearman <em>ρ</em> = 0.754; MAE = 0.473 V). With standardized protocols and traceable workflows, PEMD provides a reliable platform for high-throughput screening and data-driven design of solid polymer electrolytes.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 193-202"},"PeriodicalIF":6.2,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00454c?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gabriel Greene-Diniz, Georgia Prokopiou, David Zsolt Manrique and David Muñoz Ramo
The ability to prepare states for quantum chemistry is a promising feature of quantum computers, and efficient techniques for chemical state preparation is an active area of research. In this paper, we implement and investigate two methods of quantum circuit preparation for multiconfigurational states for quantum chemical applications. It has previously been shown that controlled Givens rotations are universal for quantum chemistry. To prepare a selected linear combination of Slater determinants (represented as occupation number configurations) using Givens rotations, the gates that rotate between the reference and excited determinants need to be controlled on qubits outside the excitation (external controls), in general. We implement a method to automatically find the external controls required for utilizing Givens rotations to prepare multiconfigurational states on a quantum circuit. We compare this approach to an alternative technique that exploits the sparsity of the chemical state vector and find that the latter can outperform the method of externally controlled Givens rotations; highly reduced circuits can be obtained by taking advantage of the sparse nature (where the number of basis states is significantly less than 2nq for nq qubits) of chemical wavefunctions. We demonstrate the benefits of these techniques in a range of applications, including the ground states of a strongly correlated molecule, matrix elements of the Q-SCEOM algorithm for excited states, as well as correlated initial states for a quantum subspace method based on quantum computed moments and quantum phase estimation.
{"title":"Quantum state preparation of multiconfigurational states for quantum chemistry","authors":"Gabriel Greene-Diniz, Georgia Prokopiou, David Zsolt Manrique and David Muñoz Ramo","doi":"10.1039/D5DD00350D","DOIUrl":"https://doi.org/10.1039/D5DD00350D","url":null,"abstract":"<p >The ability to prepare states for quantum chemistry is a promising feature of quantum computers, and efficient techniques for chemical state preparation is an active area of research. In this paper, we implement and investigate two methods of quantum circuit preparation for multiconfigurational states for quantum chemical applications. It has previously been shown that controlled Givens rotations are universal for quantum chemistry. To prepare a selected linear combination of Slater determinants (represented as occupation number configurations) using Givens rotations, the gates that rotate between the reference and excited determinants need to be controlled on qubits outside the excitation (external controls), in general. We implement a method to automatically find the external controls required for utilizing Givens rotations to prepare multiconfigurational states on a quantum circuit. We compare this approach to an alternative technique that exploits the sparsity of the chemical state vector and find that the latter can outperform the method of externally controlled Givens rotations; highly reduced circuits can be obtained by taking advantage of the sparse nature (where the number of basis states is significantly less than 2<small><sup><em>n</em><small><sub><em>q</em></sub></small></sup></small> for <em>n</em><small><sub><em>q</em></sub></small> qubits) of chemical wavefunctions. We demonstrate the benefits of these techniques in a range of applications, including the ground states of a strongly correlated molecule, matrix elements of the Q-SCEOM algorithm for excited states, as well as correlated initial states for a quantum subspace method based on quantum computed moments and quantum phase estimation.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 134-152"},"PeriodicalIF":6.2,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00350d?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hossein Mashhadimoslem, Mohammad Ali Abdol, Kourosh Zanganeh, Ahmed Shafeen, Encheng Liu, Sohrab Zendehboudi, Ali Elkamel and Aiping Yu
This research focuses on efficiently collecting CO2 adsorption data using experimental metal–organic framework (MOF) porous materials from the scientific literature, addressing the challenges related to data classification and access to MOF synthesis methods. The aim is to organize, classify, and facilitate easy access to materials science information using artificial intelligence (AI). Using advanced large language models (LLMs), we developed a systematic approach to extract and sort MOF synthesis data for CO2 adsorption in a structured format. Using this method, we collected data from over 433 published experimental research papers and created a specific dataset to analyze the effects of metals, ligands, and carbon adsorption conditions on CO2 uptake performance. The correlations between the material structure, such as metal types, ligands, specific surface area, pore size, pore volume, synthesis conditions, and CO2 adsorption, under various process conditions were examined using the final database. We applied ChatGPT 4o mini as an AI assistant to text-mine all MOF information from different PDF file references. In addition to revealing the impact of each parameter on CO2 uptake and MOF structure before synthesis, the AI analysis findings indicated which ligand and metal groups should be altered to customize the MOF structure for improved CO2 capture.
{"title":"Toward smart CO2 capture by the synthesis of metal organic frameworks using large language models","authors":"Hossein Mashhadimoslem, Mohammad Ali Abdol, Kourosh Zanganeh, Ahmed Shafeen, Encheng Liu, Sohrab Zendehboudi, Ali Elkamel and Aiping Yu","doi":"10.1039/D5DD00446B","DOIUrl":"https://doi.org/10.1039/D5DD00446B","url":null,"abstract":"<p >This research focuses on efficiently collecting CO<small><sub>2</sub></small> adsorption data using experimental metal–organic framework (MOF) porous materials from the scientific literature, addressing the challenges related to data classification and access to MOF synthesis methods. The aim is to organize, classify, and facilitate easy access to materials science information using artificial intelligence (AI). Using advanced large language models (LLMs), we developed a systematic approach to extract and sort MOF synthesis data for CO<small><sub>2</sub></small> adsorption in a structured format. Using this method, we collected data from over 433 published experimental research papers and created a specific dataset to analyze the effects of metals, ligands, and carbon adsorption conditions on CO<small><sub>2</sub></small> uptake performance. The correlations between the material structure, such as metal types, ligands, specific surface area, pore size, pore volume, synthesis conditions, and CO<small><sub>2</sub></small> adsorption, under various process conditions were examined using the final database. We applied ChatGPT 4o mini as an AI assistant to text-mine all MOF information from different PDF file references. In addition to revealing the impact of each parameter on CO<small><sub>2</sub></small> uptake and MOF structure before synthesis, the AI analysis findings indicated which ligand and metal groups should be altered to customize the MOF structure for improved CO<small><sub>2</sub></small> capture.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 384-396"},"PeriodicalIF":6.2,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00446b?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wenkai Ning, Musen Li, Jeffrey R. Reimers and Rika Kobayashi
Large Language Models (LLMs) are increasingly utilized for large-scale extraction and organization of unstructured data owing to their exceptional Natural Language Processing (NLP) capabilities. Empowering materials design, vast amounts of data from experiments and simulations are scattered across numerous scientific publications, but high-quality experimental databases are scarce. This study considers the effectiveness and practicality of five representative AI tools (ChemDataExtractor, BERT-PSIE, ChatExtract, LangChain, and Kimi) to extract bandgaps from 200 randomly selected materials science publications in two presentations (arXiv and publisher versions), comparing the results to those obtained by human processing. Although the integrity of data extraction has not met expectations, encouraging results have been achieved in terms of precision and the ability to eliminate irrelevant papers from human consideration. Our analysis highlights both the strengths and limitations of these tools, offering insights into improving future data extraction techniques for enhanced scientific discovery and innovation. In conjunction with recent research, we provide guidance on feasible improvements for future data extraction methodologies, helping to bridge the gap between unstructured scientific data and structured, actionable databases.
{"title":"Optimizing data extraction from materials science literature: a study of tools using large language models","authors":"Wenkai Ning, Musen Li, Jeffrey R. Reimers and Rika Kobayashi","doi":"10.1039/D5DD00482A","DOIUrl":"https://doi.org/10.1039/D5DD00482A","url":null,"abstract":"<p >Large Language Models (LLMs) are increasingly utilized for large-scale extraction and organization of unstructured data owing to their exceptional Natural Language Processing (NLP) capabilities. Empowering materials design, vast amounts of data from experiments and simulations are scattered across numerous scientific publications, but high-quality experimental databases are scarce. This study considers the effectiveness and practicality of five representative AI tools (ChemDataExtractor, BERT-PSIE, ChatExtract, LangChain, and Kimi) to extract bandgaps from 200 randomly selected materials science publications in two presentations (arXiv and publisher versions), comparing the results to those obtained by human processing. Although the integrity of data extraction has not met expectations, encouraging results have been achieved in terms of precision and the ability to eliminate irrelevant papers from human consideration. Our analysis highlights both the strengths and limitations of these tools, offering insights into improving future data extraction techniques for enhanced scientific discovery and innovation. In conjunction with recent research, we provide guidance on feasible improvements for future data extraction methodologies, helping to bridge the gap between unstructured scientific data and structured, actionable databases.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 2","pages":" 698-715"},"PeriodicalIF":6.2,"publicationDate":"2025-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00482a?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146211346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aikaterini Vriza, Uma Kornu, Aditya Koneru, Henry Chan and Subramanian K. R. S. Sankaranarayanan
One of the main bottlenecks for the wide adoption of atomistic simulation pipelines for computational materials design is the high complexity of the workflows which many times requires the use of a diverse set of specialized toolkits and libraries. Here, we introduce a multi-agent artificial intelligence (AI) framework that autonomously performs end-to-end atomistic simulations, i.e. molecular dynamics (MD), with automated input and associated full suite of analyses, using large language models (LLMs) and multiple specialized AI agents. Our system orchestrates the entire simulation pipeline, from structure generation via Atomsk and interatomic potential discovery through automated web mining, to simulation setup and execution using LAMMPS on high-performance computing (HPC) platforms. Post-simulation, our agentic framework performs automated data analysis and visualization with popular analysis tools like OVITO and Phonopy. Each expert agent operates within a defined role, equipped with domain-specific functions and a shared memory context for coordination. Using a diverse set of representative elemental and alloy systems, we demonstrate the capability of our framework to execute a range of static and dynamic materials modeling tasks, including lattice parameter and cohesive energy estimation, elastic constants computation, phonon dispersion analysis, as well as perform MD simulations to determine dynamical properties that aid estimation of melting point. The results produced by the agents show strong agreement with those obtained by a human expert, highlighting the reliability of the agentic approach. By combining automation, reproducibility, and human-in-the-loop control, our framework lowers the barrier to the widespread adoption of scalable, AI-driven discovery tools in materials science.
{"title":"Multi-agentic AI framework for end-to-end atomistic simulations","authors":"Aikaterini Vriza, Uma Kornu, Aditya Koneru, Henry Chan and Subramanian K. R. S. Sankaranarayanan","doi":"10.1039/D5DD00435G","DOIUrl":"https://doi.org/10.1039/D5DD00435G","url":null,"abstract":"<p >One of the main bottlenecks for the wide adoption of atomistic simulation pipelines for computational materials design is the high complexity of the workflows which many times requires the use of a diverse set of specialized toolkits and libraries. Here, we introduce a multi-agent artificial intelligence (AI) framework that autonomously performs end-to-end atomistic simulations, <em>i.e.</em> molecular dynamics (MD), with automated input and associated full suite of analyses, using large language models (LLMs) and multiple specialized AI agents. Our system orchestrates the entire simulation pipeline, from structure generation <em>via</em> Atomsk and interatomic potential discovery through automated web mining, to simulation setup and execution using LAMMPS on high-performance computing (HPC) platforms. Post-simulation, our agentic framework performs automated data analysis and visualization with popular analysis tools like OVITO and Phonopy. Each expert agent operates within a defined role, equipped with domain-specific functions and a shared memory context for coordination. Using a diverse set of representative elemental and alloy systems, we demonstrate the capability of our framework to execute a range of static and dynamic materials modeling tasks, including lattice parameter and cohesive energy estimation, elastic constants computation, phonon dispersion analysis, as well as perform MD simulations to determine dynamical properties that aid estimation of melting point. The results produced by the agents show strong agreement with those obtained by a human expert, highlighting the reliability of the agentic approach. By combining automation, reproducibility, and human-in-the-loop control, our framework lowers the barrier to the widespread adoption of scalable, AI-driven discovery tools in materials science.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 440-452"},"PeriodicalIF":6.2,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00435g?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Artem Mishchenko, Anupam Bhattacharya, Xiangwen Wang, Henry Kelbrick Pentz, Yihao Wei and Qian Yang
This review explores the impact of deep learning (DL) techniques on understanding and predicting electronic structures in two-dimensional (2D) materials. We highlight unique computational challenges posed by 2D materials and discuss how DL approaches – such as physics-aware models, generative AI, and inverse design – have significantly improved predictions of critical electronic properties, including band structures, density of states, and quantum transport phenomena. Through selected case studies, we illustrate how DL methods accelerate discoveries in emergent quantum phenomena, topology, superconductivity, and autonomous materials exploration. Finally, we outline promising future directions, stressing the need for robust data standardization and advocating for integrated frameworks that combine theoretical modeling, DL methods, and experimental validations.
{"title":"Deep learning methods for 2D material electronic properties","authors":"Artem Mishchenko, Anupam Bhattacharya, Xiangwen Wang, Henry Kelbrick Pentz, Yihao Wei and Qian Yang","doi":"10.1039/D5DD00155B","DOIUrl":"10.1039/D5DD00155B","url":null,"abstract":"<p >This review explores the impact of deep learning (DL) techniques on understanding and predicting electronic structures in two-dimensional (2D) materials. We highlight unique computational challenges posed by 2D materials and discuss how DL approaches – such as physics-aware models, generative AI, and inverse design – have significantly improved predictions of critical electronic properties, including band structures, density of states, and quantum transport phenomena. Through selected case studies, we illustrate how DL methods accelerate discoveries in emergent quantum phenomena, topology, superconductivity, and autonomous materials exploration. Finally, we outline promising future directions, stressing the need for robust data standardization and advocating for integrated frameworks that combine theoretical modeling, DL methods, and experimental validations.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 28-63"},"PeriodicalIF":6.2,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12720248/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145822210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohamadreza Ramezani, Poulomi Nandi, Pablo Antonio De La Fuente-Moreno and Majid Beidaghi
The discovery of next-generation battery electrolytes increasingly involves complex, multicomponent formulations that demand high-throughput, systematic exploration. We present the Bayesian Robotic Investigator of Novel Electrolytes (BRINE), a cost-effective, self-driving laboratory (SDL) that autonomously prepares and tests mixed electrolyte solutions. BRINE combines an open-source liquid-handling robot with a potentiostat and custom-made electrodes to mix reagents and perform electrochemical measurements without human intervention. A Bayesian optimization routine navigates multidimensional composition spaces, allowing the platform to rapidly identify promising formulations. As a proof of concept, BRINE mapped ionic conductivity in two aqueous electrolyte spaces (i) aqueous mixtures of NaCl, KCl, MgCl2, and CaCl2, and (ii) battery-oriented mixtures containing ZnCl2, KCl, NH4Cl, NaCl, and EMIMCl, testing ≈230 unique compositions in under 20 hours and finding conductivities up to 32.13 S m−1. These results demonstrate how closed-loop autonomous experimentation and optimization accelerate the identification of electrolytes with the highest conductivity across a large multicomponent composition space, while minimizing experimental variability. This work lays the foundation for broader electrochemical studies using the BRINE platform.
下一代电池电解质的发现越来越多地涉及到复杂的、多组分的配方,这需要高通量、系统的探索。我们介绍了新型电解质的贝叶斯机器人调查员(BRINE),这是一个具有成本效益的自动驾驶实验室(SDL),可以自主制备和测试混合电解质溶液。BRINE将开源液体处理机器人与恒电位器和定制电极结合在一起,混合试剂并进行电化学测量,无需人工干预。贝叶斯优化程序导航多维组合空间,允许平台快速识别有前途的配方。作为概念验证,BRINE绘制了两个水溶液电解质空间(i) NaCl、KCl、MgCl2和CaCl2的水溶液混合物,以及(ii)含有ZnCl2、KCl、NH4Cl、NaCl和EMIMCl的电池取向混合物中的离子电导率,在20小时内测试了约230种独特的成分,发现电导率高达32.13 S m−1。这些结果证明了闭环自主实验和优化如何加速在大的多组分组成空间中识别具有最高电导率的电解质,同时最大限度地减少实验变化。这项工作为使用BRINE平台进行更广泛的电化学研究奠定了基础。
{"title":"BRINE: a cost-effective electrochemical self-driving laboratory for accelerated discovery of high-performance electrolytes","authors":"Mohamadreza Ramezani, Poulomi Nandi, Pablo Antonio De La Fuente-Moreno and Majid Beidaghi","doi":"10.1039/D5DD00353A","DOIUrl":"https://doi.org/10.1039/D5DD00353A","url":null,"abstract":"<p >The discovery of next-generation battery electrolytes increasingly involves complex, multicomponent formulations that demand high-throughput, systematic exploration. We present the Bayesian Robotic Investigator of Novel Electrolytes (BRINE), a cost-effective, self-driving laboratory (SDL) that autonomously prepares and tests mixed electrolyte solutions. BRINE combines an open-source liquid-handling robot with a potentiostat and custom-made electrodes to mix reagents and perform electrochemical measurements without human intervention. A Bayesian optimization routine navigates multidimensional composition spaces, allowing the platform to rapidly identify promising formulations. As a proof of concept, BRINE mapped ionic conductivity in two aqueous electrolyte spaces (i) aqueous mixtures of NaCl, KCl, MgCl<small><sub>2</sub></small>, and CaCl<small><sub>2</sub></small>, and (ii) battery-oriented mixtures containing ZnCl<small><sub>2</sub></small>, KCl, NH<small><sub>4</sub></small>Cl, NaCl, and EMIMCl, testing ≈230 unique compositions in under 20 hours and finding conductivities up to 32.13 S m<small><sup>−1</sup></small>. These results demonstrate how closed-loop autonomous experimentation and optimization accelerate the identification of electrolytes with the highest conductivity across a large multicomponent composition space, while minimizing experimental variability. This work lays the foundation for broader electrochemical studies using the BRINE platform.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 397-406"},"PeriodicalIF":6.2,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00353a?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Machine Learning Interatomic Potentials (MLIPs) are a promising alternative to expensive ab initio quantum mechanical molecular simulations. Given the diversity of chemical spaces that are of interest and the cost of generating new data, it is important to understand how universal MLIPs generalize beyond their training distributions. In order to characterize and better understand distribution shifts in MLIPs—that is, changes between the training and testing distributions—we conduct diagnostic experiments on chemical datasets, revealing common shifts that pose significant challenges, even for large universal models trained on extensive data. Based on these observations, we hypothesize that current supervised training methods inadequately regularize MLIPs, resulting in overfitting and learning poor representations of out-of-distribution systems. We then propose two new methods as initial steps for mitigating distribution shifts for MLIPs. Our methods focus on test-time refinement strategies that incur minimal computational cost and do not use expensive ab initio reference labels. The first strategy, based on spectral graph theory, modifies the edges of test graphs to align with graph structures seen during training. Our second strategy improves representations for out-of-distribution systems at test-time by taking gradient steps using an auxiliary objective, such as a cheap physical prior. Our test-time refinement strategies significantly reduce errors on out-of-distribution systems, suggesting that MLIPs are capable of and can move towards modeling diverse chemical spaces, but are not being effectively trained to do so. Our experiments establish clear benchmarks for evaluating the generalization capabilities of the next generation of MLIPs. Our code is available at https://tkreiman.github.io/projects/mlff_distribution_shifts/.
{"title":"Understanding and mitigating distribution shifts for universal machine learning interatomic potentials","authors":"Tobias Kreiman and Aditi S. Krishnapriyan","doi":"10.1039/D5DD00260E","DOIUrl":"https://doi.org/10.1039/D5DD00260E","url":null,"abstract":"<p >Machine Learning Interatomic Potentials (MLIPs) are a promising alternative to expensive <em>ab initio</em> quantum mechanical molecular simulations. Given the diversity of chemical spaces that are of interest and the cost of generating new data, it is important to understand how universal MLIPs generalize beyond their training distributions. In order to characterize and better understand distribution shifts in MLIPs—that is, changes between the training and testing distributions—we conduct diagnostic experiments on chemical datasets, revealing common shifts that pose significant challenges, even for large universal models trained on extensive data. Based on these observations, we hypothesize that current supervised training methods inadequately regularize MLIPs, resulting in overfitting and learning poor representations of out-of-distribution systems. We then propose two new methods as initial steps for mitigating distribution shifts for MLIPs. Our methods focus on test-time refinement strategies that incur minimal computational cost and do not use expensive <em>ab initio</em> reference labels. The first strategy, based on spectral graph theory, modifies the edges of test graphs to align with graph structures seen during training. Our second strategy improves representations for out-of-distribution systems at test-time by taking gradient steps using an auxiliary objective, such as a cheap physical prior. Our test-time refinement strategies significantly reduce errors on out-of-distribution systems, suggesting that MLIPs are capable of and can move towards modeling diverse chemical spaces, but are not being effectively trained to do so. Our experiments establish clear benchmarks for evaluating the generalization capabilities of the next generation of MLIPs. Our code is available at https://tkreiman.github.io/projects/mlff_distribution_shifts/.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 415-439"},"PeriodicalIF":6.2,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00260e?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thomas Van Hout, Oliver Loveday, Jordi Morales-Vidal, Santiago Morandi and Núria López
The estimation of the strength of the bond of adsorbates on the surface is key to the design of novel materials for heterogeneous catalysis. Machine learning (ML) methodologies have proven effective in rapidly and accurately evaluating adsorption energies on transition metal surfaces. However, the complexity of metal oxides and their diverse adsorbate–catalyst interactions hinder the sound transfer of ML approaches to these catalytically relevant materials. To address this challenge, we have evaluated the transferability of GAME-Net, a graph neural network developed for transition metals, by following an approach of increasing complexity, leading to GAME-Net-Ox. A density functional theory dataset was built with organic molecules on conductive (IrO2 and RuO2) and semiconductive (TiO2) rutile oxides to evaluate GAME-Net's transferability. While the original GAME-Net failed to directly generalize between metals and metal oxides, GAME-Net-Ox trained exclusively on oxides achieved high accuracy (MAE = 0.16 eV) and both families of materials can be treated in GAME-Net-Ox with the same accuracy (MAE = 0.16 eV). This work demonstrates the adaptability of the GAME-Net architecture, enabling the screening of adsorbates on metal oxides, materials with complex electronic properties.
{"title":"Evaluating the transfer learning from metals to oxides with GAME-Net-Ox","authors":"Thomas Van Hout, Oliver Loveday, Jordi Morales-Vidal, Santiago Morandi and Núria López","doi":"10.1039/D5DD00331H","DOIUrl":"https://doi.org/10.1039/D5DD00331H","url":null,"abstract":"<p >The estimation of the strength of the bond of adsorbates on the surface is key to the design of novel materials for heterogeneous catalysis. Machine learning (ML) methodologies have proven effective in rapidly and accurately evaluating adsorption energies on transition metal surfaces. However, the complexity of metal oxides and their diverse adsorbate–catalyst interactions hinder the sound transfer of ML approaches to these catalytically relevant materials. To address this challenge, we have evaluated the transferability of GAME-Net, a graph neural network developed for transition metals, by following an approach of increasing complexity, leading to GAME-Net-Ox. A density functional theory dataset was built with organic molecules on conductive (IrO<small><sub>2</sub></small> and RuO<small><sub>2</sub></small>) and semiconductive (TiO<small><sub>2</sub></small>) rutile oxides to evaluate GAME-Net's transferability. While the original GAME-Net failed to directly generalize between metals and metal oxides, GAME-Net-Ox trained exclusively on oxides achieved high accuracy (MAE = 0.16 eV) and both families of materials can be treated in GAME-Net-Ox with the same accuracy (MAE = 0.16 eV). This work demonstrates the adaptability of the GAME-Net architecture, enabling the screening of adsorbates on metal oxides, materials with complex electronic properties.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 407-414"},"PeriodicalIF":6.2,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00331h?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aditya Ranganath, Hyojin Kim, Heesung Shim and Jonathan E. Allen
Machine learning models are often used as scoring functions to predict the binding affinity of a protein–ligand complex. These models are trained with limited amounts of data with experimentally measured binding affinity values. A large number of compounds are labeled inactive through single-concentration screens without measuring binding affinities. These inactive compounds, along with the active ones, can be used to train binary classification models, while regression models are trained using compounds with binding affinities only. However, the classification and regression tasks are often handled separately, without sharing the learned feature representations. In this paper, we propose a novel model architecture that jointly performs regression and classification objectives, aiming to maximize data utilization and improve predictive performance by leveraging two complementary tasks. In our setup, the regression yields the binding affinity, whereas the classification task yields the label as active or inactive. We demonstrate our method using PDBbind, the standard 3D structure database, as well as a dataset of flavivirus protease compounds with binding affinity data. Our experiments show that the new joint training strategy improves the accuracy of the model, increasing applicability in various practical drug screening scenarios.
{"title":"SLAB: simultaneous labeling and binding affinity prediction for protein–ligand structures","authors":"Aditya Ranganath, Hyojin Kim, Heesung Shim and Jonathan E. Allen","doi":"10.1039/D5DD00248F","DOIUrl":"https://doi.org/10.1039/D5DD00248F","url":null,"abstract":"<p >Machine learning models are often used as scoring functions to predict the binding affinity of a protein–ligand complex. These models are trained with limited amounts of data with experimentally measured binding affinity values. A large number of compounds are labeled inactive through single-concentration screens without measuring binding affinities. These inactive compounds, along with the active ones, can be used to train binary classification models, while regression models are trained using compounds with binding affinities only. However, the classification and regression tasks are often handled separately, without sharing the learned feature representations. In this paper, we propose a novel model architecture that jointly performs regression and classification objectives, aiming to maximize data utilization and improve predictive performance by leveraging two complementary tasks. In our setup, the regression yields the binding affinity, whereas the classification task yields the label as active or inactive. We demonstrate our method using PDBbind, the standard 3D structure database, as well as a dataset of flavivirus protease compounds with binding affinity data. Our experiments show that the new joint training strategy improves the accuracy of the model, increasing applicability in various practical drug screening scenarios.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 375-383"},"PeriodicalIF":6.2,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00248f?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}