Philipp Schleich, Lasse Bjørn Kristensen, Jorge A. Campos-Gonzalez-Angulo, Abdulrahman Aldossary, Davide Avagliano, Mohsen Bagherimehrab, Christoph Gorgulla, Joe Fitzsimons and Alán Aspuru-Guzik
Simulating chemical systems is highly sought after and computationally challenging, as the number of degrees of freedom increases exponentially with the size of the system. Quantum computers have been proposed as a computational means to overcome this bottleneck, thanks to their capability of representing this amount of information efficiently. Most efforts so far have been centered around determining the ground states of chemical systems. However, hardness results and the lack of theoretical guarantees for efficient heuristics for initial-state generation shed doubt on the feasibility. Here, we propose a heuristically guided approach that is based on inherently efficient routines to solve chemical simulation problems, requiring quantum circuits of size scaling polynomially in relevant system parameters. If a set of assumptions can be satisfied, our approach finds good initial states for dynamics simulation by assembling them in a scattering tree. In particular, we investigate a scattering-based state preparation approach within the context of mergo-association. We discuss a variety of quantities of chemical interest that can be measured after the quantum simulation of a process, e.g., a reaction, following its corresponding initial state preparation.
{"title":"Chemically motivated simulation problems are efficiently solvable on a quantum computer","authors":"Philipp Schleich, Lasse Bjørn Kristensen, Jorge A. Campos-Gonzalez-Angulo, Abdulrahman Aldossary, Davide Avagliano, Mohsen Bagherimehrab, Christoph Gorgulla, Joe Fitzsimons and Alán Aspuru-Guzik","doi":"10.1039/D5DD00377F","DOIUrl":"https://doi.org/10.1039/D5DD00377F","url":null,"abstract":"<p >Simulating chemical systems is highly sought after and computationally challenging, as the number of degrees of freedom increases exponentially with the size of the system. Quantum computers have been proposed as a computational means to overcome this bottleneck, thanks to their capability of representing this amount of information efficiently. Most efforts so far have been centered around determining the ground states of chemical systems. However, hardness results and the lack of theoretical guarantees for efficient heuristics for initial-state generation shed doubt on the feasibility. Here, we propose a heuristically guided approach that is based on inherently efficient routines to solve chemical simulation problems, requiring quantum circuits of size scaling polynomially in relevant system parameters. If a set of assumptions can be satisfied, our approach finds good initial states for dynamics simulation by assembling them in a scattering tree. In particular, we investigate a scattering-based state preparation approach within the context of mergo-association. We discuss a variety of quantities of chemical interest that can be measured after the quantum simulation of a process, <em>e.g.</em>, a reaction, following its corresponding initial state preparation.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 64-87"},"PeriodicalIF":6.2,"publicationDate":"2025-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00377f?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sitanan Sartyoungkul, Balasubramaniyan Sakthivel, Pavel Sidorov and Yuuya Nagata
The integration of automated synthesis and machine learning (ML) is transforming analytical chemistry by enabling data-driven approaches to method development. Chromatographic column selection, a critical yet time-consuming step in separation science, stands to benefit substantially from such advances. Here, we report a workflow that combines automated synthesis of a structurally diverse amide library with fragment descriptor-based ML for retention time prediction in supercritical fluid chromatography (SFC). Retention data were systematically acquired on the recently developed DCpak® PBT column, providing one of the first structured datasets for this stationary phase. Benchmarking revealed that fragment-count descriptors (ChyLine and CircuS) substantially outperformed conventional molecular fingerprints, delivering higher predictive accuracy and more interpretable relationships between substructures and retention behavior. External validation underscored the role of chemical space coverage, while visualization techniques such as ColorAtom analysis offered mechanistic insight into model decisions. By uniting automated synthesis with chemoinformatics-driven ML, this study demonstrates a scalable approach to generating high-quality training data and predictive models for chromatography. Beyond retention prediction, the framework exemplifies how data-centric strategies can accelerate column characterization, reduce reliance on trial-and-error experimentation, and advance the development of autonomous, high-throughput analytical workflows.
{"title":"Automated synthesis and fragment descriptor-based machine learning for retention time prediction in supercritical fluid chromatography","authors":"Sitanan Sartyoungkul, Balasubramaniyan Sakthivel, Pavel Sidorov and Yuuya Nagata","doi":"10.1039/D5DD00437C","DOIUrl":"https://doi.org/10.1039/D5DD00437C","url":null,"abstract":"<p >The integration of automated synthesis and machine learning (ML) is transforming analytical chemistry by enabling data-driven approaches to method development. Chromatographic column selection, a critical yet time-consuming step in separation science, stands to benefit substantially from such advances. Here, we report a workflow that combines automated synthesis of a structurally diverse amide library with fragment descriptor-based ML for retention time prediction in supercritical fluid chromatography (SFC). Retention data were systematically acquired on the recently developed DCpak® PBT column, providing one of the first structured datasets for this stationary phase. Benchmarking revealed that fragment-count descriptors (ChyLine and CircuS) substantially outperformed conventional molecular fingerprints, delivering higher predictive accuracy and more interpretable relationships between substructures and retention behavior. External validation underscored the role of chemical space coverage, while visualization techniques such as ColorAtom analysis offered mechanistic insight into model decisions. By uniting automated synthesis with chemoinformatics-driven ML, this study demonstrates a scalable approach to generating high-quality training data and predictive models for chromatography. Beyond retention prediction, the framework exemplifies how data-centric strategies can accelerate column characterization, reduce reliance on trial-and-error experimentation, and advance the development of autonomous, high-throughput analytical workflows.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 310-316"},"PeriodicalIF":6.2,"publicationDate":"2025-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00437c?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146007005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ankur K. Gupta, Caitlin V. Hetherington and Wibe A. de Jong
The separation of rare-earth metals, vital for numerous advanced technologies, is hampered by their similar chemical properties, making ligand discovery a significant challenge. Traditional experimental and quantum chemistry approaches for identifying effective ligands are often resource-intensive. We introduce a machine learning protocol based on an equivariant neural network, Allegro, for the rapid and accurate prediction of binding energies in rare-earth complexes. Key to this work is our newly curated dataset of rare-earth metal complexes—made publicly available to foster further research—systematically generated using the Architector program. This dataset distinctively features functionalized derivatives of proven rare-earth-chelating scaffolds, hydroxypyridinone (HOPO), catecholamide (CAM), and their thio-analogues, selected for their established efficacy in binding these elements. Trained on this valuable resource, our Allegro models demonstrate excellent performance, particularly when trained to directly predict DFT-level binding energies, yielding highly accurate results that closely correlate with theoretical calculations on a diverse test set. Furthermore, this strategy exhibited strong out-of-sample generalization, accurately predicting binding energies for an isomeric HOPO-derivative ligand not seen during training. By substantially reducing computational demands, this machine learning framework, alongside the provided dataset, represent powerful tools to accelerate the high-throughput screening and rational design of novel ligands for efficient rare-earth metal separation.
{"title":"Toward accelerating rare-earth metal extraction using equivariant neural networks","authors":"Ankur K. Gupta, Caitlin V. Hetherington and Wibe A. de Jong","doi":"10.1039/D5DD00286A","DOIUrl":"https://doi.org/10.1039/D5DD00286A","url":null,"abstract":"<p >The separation of rare-earth metals, vital for numerous advanced technologies, is hampered by their similar chemical properties, making ligand discovery a significant challenge. Traditional experimental and quantum chemistry approaches for identifying effective ligands are often resource-intensive. We introduce a machine learning protocol based on an equivariant neural network, Allegro, for the rapid and accurate prediction of binding energies in rare-earth complexes. Key to this work is our newly curated dataset of rare-earth metal complexes—made publicly available to foster further research—systematically generated using the <em>Architector</em> program. This dataset distinctively features functionalized derivatives of proven rare-earth-chelating scaffolds, hydroxypyridinone (HOPO), catecholamide (CAM), and their thio-analogues, selected for their established efficacy in binding these elements. Trained on this valuable resource, our Allegro models demonstrate excellent performance, particularly when trained to directly predict DFT-level binding energies, yielding highly accurate results that closely correlate with theoretical calculations on a diverse test set. Furthermore, this strategy exhibited strong out-of-sample generalization, accurately predicting binding energies for an isomeric HOPO-derivative ligand not seen during training. By substantially reducing computational demands, this machine learning framework, alongside the provided dataset, represent powerful tools to accelerate the high-throughput screening and rational design of novel ligands for efficient rare-earth metal separation.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 363-374"},"PeriodicalIF":6.2,"publicationDate":"2025-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00286a?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Salvatore Sorrentino, Alessandro Gussoni, Francesco Calcagno, Gioele Pasotti, Davide Avagliano, Ivan Rivalta, Marco Garavelli and Dario Polli
Raman spectroscopy is a powerful technique for probing molecular vibrations, yet the computational prediction of Raman spectra remains challenging due to the high cost of quantum chemical methods and the complexity of structure–spectrum relationships. Here, we introduce Mol2Raman, a deep-learning framework that predicts spontaneous Raman spectra directly from SMILES representations of molecules. The model leverages Graph Isomorphism Networks with edge features (GINE) to encode molecular topology and bond characteristics, enabling accurate prediction of both peak positions and intensities across diverse chemical structures. Trained on a novel dataset of over 31 000 molecules with state-of-the-art Density Functional Theory (DFT)-calculated Raman spectra, Mol2Raman outperforms both fingerprint-based similarity models and Chemprop-based neural networks. It achieves a high fidelity in reproducing spectral features, including for molecules with low structural similarity to the training set and for enantiomeric inversion. The model offers fast inference times (22 ms per molecule), making it suitable for high-throughput molecular screening. We further deploy Mol2Raman as an open-access web application, enabling real-time predictions without specialized hardware. This work establishes a scalable, accurate, and interpretable platform for Raman spectral prediction, opening new opportunities in molecular design, materials discovery, and spectroscopic diagnostics.
{"title":"Mol2Raman: a graph neural network model for predicting Raman spectra from SMILES representations","authors":"Salvatore Sorrentino, Alessandro Gussoni, Francesco Calcagno, Gioele Pasotti, Davide Avagliano, Ivan Rivalta, Marco Garavelli and Dario Polli","doi":"10.1039/D5DD00210A","DOIUrl":"10.1039/D5DD00210A","url":null,"abstract":"<p >Raman spectroscopy is a powerful technique for probing molecular vibrations, yet the computational prediction of Raman spectra remains challenging due to the high cost of quantum chemical methods and the complexity of structure–spectrum relationships. Here, we introduce Mol2Raman, a deep-learning framework that predicts spontaneous Raman spectra directly from SMILES representations of molecules. The model leverages Graph Isomorphism Networks with edge features (GINE) to encode molecular topology and bond characteristics, enabling accurate prediction of both peak positions and intensities across diverse chemical structures. Trained on a novel dataset of over 31 000 molecules with state-of-the-art Density Functional Theory (DFT)-calculated Raman spectra, Mol2Raman outperforms both fingerprint-based similarity models and Chemprop-based neural networks. It achieves a high fidelity in reproducing spectral features, including for molecules with low structural similarity to the training set and for enantiomeric inversion. The model offers fast inference times (22 ms per molecule), making it suitable for high-throughput molecular screening. We further deploy Mol2Raman as an open-access web application, enabling real-time predictions without specialized hardware. This work establishes a scalable, accurate, and interpretable platform for Raman spectral prediction, opening new opportunities in molecular design, materials discovery, and spectroscopic diagnostics.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 161-176"},"PeriodicalIF":6.2,"publicationDate":"2025-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12691243/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145745937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wentao Li, Yijun Li, Qi Lei, Zemeng Wang and Xiaonan Wang
Designing high-performance polymers remains a critical challenge due to the vast design space. While machine learning and generative models have advanced polymer informatics, most approaches lack directional optimization capabilities and fail to close the loop between design and physical validation. Here we introduce PolyRL, a closed-loop reinforcement learning (RL) framework for the inverse design of gas separation polymers. By integrating reward model training, generative model pre-training, RL fine-tuning, and theoretical validation, PolyRL achieves multi-objective optimization under data-scarce conditions. We demonstrate that PolyRL is capable of efficiently generating polymer candidates with enhanced gas separation performance, as substantiated by detailed molecular simulation analyses. Additionally, we establish a standardized benchmark for RL-based polymer generation, providing a foundation for future research. This work showcases the power of reinforcement learning in polymer design and advances AI-driven materials discovery toward closed-loop, goal-directed paradigms.
{"title":"PolyRL: reinforcement learning-guided polymer generation for multi-objective polymer discovery","authors":"Wentao Li, Yijun Li, Qi Lei, Zemeng Wang and Xiaonan Wang","doi":"10.1039/D5DD00272A","DOIUrl":"https://doi.org/10.1039/D5DD00272A","url":null,"abstract":"<p >Designing high-performance polymers remains a critical challenge due to the vast design space. While machine learning and generative models have advanced polymer informatics, most approaches lack directional optimization capabilities and fail to close the loop between design and physical validation. Here we introduce PolyRL, a closed-loop reinforcement learning (RL) framework for the inverse design of gas separation polymers. By integrating reward model training, generative model pre-training, RL fine-tuning, and theoretical validation, PolyRL achieves multi-objective optimization under data-scarce conditions. We demonstrate that PolyRL is capable of efficiently generating polymer candidates with enhanced gas separation performance, as substantiated by detailed molecular simulation analyses. Additionally, we establish a standardized benchmark for RL-based polymer generation, providing a foundation for future research. This work showcases the power of reinforcement learning in polymer design and advances AI-driven materials discovery toward closed-loop, goal-directed paradigms.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 266-276"},"PeriodicalIF":6.2,"publicationDate":"2025-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00272a?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146007001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Khayrul Islam, Ryan F. Forelli, Jianzhong Han, Deven Bhadane, Jian Huang, Joshua C. Agar, Nhan Tran, Seda Ogrenci and Yaling Liu
Precise cell classification is essential in biomedical diagnostics and therapeutic monitoring, particularly for identifying diverse cell types involved in various diseases. Traditional cell classification methods, such as flow cytometry, depend on molecular labeling, which is often costly, time-intensive, and can alter cell integrity. Real-time microfluidic sorters also impose a sub-ms decision window that existing machine-learning pipelines cannot meet. To overcome these limitations, we present a label-free machine learning framework for cell classification, designed for real-time sorting applications using bright-field microscopy images. This approach leverages a teacher–student model architecture enhanced by knowledge distillation, achieving high efficiency and scalability across different cell types. Demonstrated through a use case of classifying lymphocyte subsets, our framework accurately classifies T4, T8, and B cell types with a dataset of 80 000 pre-processed images, released publicly as the LymphoMNIST package for reproducible benchmarking. Our teacher model attained 98% accuracy in differentiating T4 cells from B cells and 93% accuracy in zero-shot classification between T8 and B cells. Remarkably, our student model operates with only 5682 parameters (∼0.02% of the teacher, a 5000-fold reduction), enabling field-programmable gate array (FPGA) deployment. Implemented directly on the frame-grabber FPGA as the first demonstration of in situ deep learning in this setting, the student model achieves an ultra-low inference latency of just 14.5 µs and a complete cell detection-to-sorting trigger time of 24.7 µs, delivering 12× and 40× improvements over the previous state of the art in inference and total latency, respectively, while preserving accuracy comparable to the teacher model. This framework establishes the first sub-25 µs ML benchmark for label-free cytometry and provides an open, cost-effective blueprint for upgrading existing imaging sorters.
{"title":"Real-time cell sorting with scalable in situ FPGA-accelerated deep learning","authors":"Khayrul Islam, Ryan F. Forelli, Jianzhong Han, Deven Bhadane, Jian Huang, Joshua C. Agar, Nhan Tran, Seda Ogrenci and Yaling Liu","doi":"10.1039/D5DD00345H","DOIUrl":"https://doi.org/10.1039/D5DD00345H","url":null,"abstract":"<p >Precise cell classification is essential in biomedical diagnostics and therapeutic monitoring, particularly for identifying diverse cell types involved in various diseases. Traditional cell classification methods, such as flow cytometry, depend on molecular labeling, which is often costly, time-intensive, and can alter cell integrity. Real-time microfluidic sorters also impose a sub-ms decision window that existing machine-learning pipelines cannot meet. To overcome these limitations, we present a label-free machine learning framework for cell classification, designed for real-time sorting applications using bright-field microscopy images. This approach leverages a teacher–student model architecture enhanced by knowledge distillation, achieving high efficiency and scalability across different cell types. Demonstrated through a use case of classifying lymphocyte subsets, our framework accurately classifies T4, T8, and B cell types with a dataset of 80 000 pre-processed images, released publicly as the LymphoMNIST package for reproducible benchmarking. Our teacher model attained 98% accuracy in differentiating T4 cells from B cells and 93% accuracy in zero-shot classification between T8 and B cells. Remarkably, our student model operates with only 5682 parameters (∼0.02% of the teacher, a 5000-fold reduction), enabling field-programmable gate array (FPGA) deployment. Implemented directly on the frame-grabber FPGA as the first demonstration of <em>in situ</em> deep learning in this setting, the student model achieves an ultra-low inference latency of just 14.5 µs and a complete cell detection-to-sorting trigger time of 24.7 µs, delivering 12× and 40× improvements over the previous state of the art in inference and total latency, respectively, while preserving accuracy comparable to the teacher model. This framework establishes the first sub-25 µs ML benchmark for label-free cytometry and provides an open, cost-effective blueprint for upgrading existing imaging sorters.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 254-265"},"PeriodicalIF":6.2,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00345h?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146007000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Trevor Hastings, James Paramore, Brady Butler and Raymundo Arróyave
Bayesian optimization (BO) has emerged as an effective strategy to accelerate the discovery of new materials by efficiently exploring complex and high-dimensional design spaces. However, the success of BO methods greatly depends on how well the optimization campaign is initialized—the selection of initial data points from which the optimization starts. In this study, we focus on improving these initial datasets by incorporating materials science expertise into the selection process. We identify common challenges and sources of uncertainty when choosing these starting points and propose practical guidelines for using expert-defined criteria to create more informative initial datasets. By evaluating these methods through simulations and real-world alloy design problems, we demonstrate that using domain-informed criteria leads to initial datasets that are more diverse and representative. This enhanced starting point significantly improves the efficiency and effectiveness of subsequent optimization efforts. We also introduce clear metrics for assessing the quality and diversity of initial datasets, providing a straightforward way to compare different initialization strategies. Our approach offers a robust and widely applicable framework to enhance Bayesian optimization across various materials discovery scenarios.
{"title":"Leveraging domain knowledge for optimal initialization in Bayesian materials optimization","authors":"Trevor Hastings, James Paramore, Brady Butler and Raymundo Arróyave","doi":"10.1039/D5DD00361J","DOIUrl":"https://doi.org/10.1039/D5DD00361J","url":null,"abstract":"<p >Bayesian optimization (BO) has emerged as an effective strategy to accelerate the discovery of new materials by efficiently exploring complex and high-dimensional design spaces. However, the success of BO methods greatly depends on how well the optimization campaign is initialized—the selection of initial data points from which the optimization starts. In this study, we focus on improving these initial datasets by incorporating materials science expertise into the selection process. We identify common challenges and sources of uncertainty when choosing these starting points and propose practical guidelines for using expert-defined criteria to create more informative initial datasets. By evaluating these methods through simulations and real-world alloy design problems, we demonstrate that using domain-informed criteria leads to initial datasets that are more diverse and representative. This enhanced starting point significantly improves the efficiency and effectiveness of subsequent optimization efforts. We also introduce clear metrics for assessing the quality and diversity of initial datasets, providing a straightforward way to compare different initialization strategies. Our approach offers a robust and widely applicable framework to enhance Bayesian optimization across various materials discovery scenarios.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 277-289"},"PeriodicalIF":6.2,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00361j?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146007002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Naruki Yoshikawa, Kevin Angers, Kourosh Darvish, Sargol Okhovatian, Dawn Bannerman, Ilya Yakavets, Milica Radisic and Alán Aspuru-Guzik
Precise liquid handling is an essential operation for self-driving laboratories. In 2023, we introduced the digital pipette, a low-cost, 3D-printed device that enables accurate liquid transfer by robotic arms. However, the initial version lacked mechanisms to prevent cross-contamination when handling multiple liquids. In this commit paper, we present the digital pipette v2, an updated design that mitigates contamination risk by allowing robotic arms to exchange pipette tips. The new hardware achieves liquid handling accuracy within the permissible error range defined by ISO 8655-2, supporting a broader range of experiments involving multiple liquids.
{"title":"Commit: Digital pipette: open hardware for liquid transfer in self-driving laboratories","authors":"Naruki Yoshikawa, Kevin Angers, Kourosh Darvish, Sargol Okhovatian, Dawn Bannerman, Ilya Yakavets, Milica Radisic and Alán Aspuru-Guzik","doi":"10.1039/D5DD00336A","DOIUrl":"https://doi.org/10.1039/D5DD00336A","url":null,"abstract":"<p >Precise liquid handling is an essential operation for self-driving laboratories. In 2023, we introduced the digital pipette, a low-cost, 3D-printed device that enables accurate liquid transfer by robotic arms. However, the initial version lacked mechanisms to prevent cross-contamination when handling multiple liquids. In this commit paper, we present the digital pipette v2, an updated design that mitigates contamination risk by allowing robotic arms to exchange pipette tips. The new hardware achieves liquid handling accuracy within the permissible error range defined by ISO 8655-2, supporting a broader range of experiments involving multiple liquids.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 93-97"},"PeriodicalIF":6.2,"publicationDate":"2025-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00336a?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jason L. Wu, David M. Friday, Changhyun Hwang, Seungjoo Yi, Tiara C. Torres-Flores, Martin D. Burke, Ying Diao, Charles M. Schroeder and Nicholas E. Jackson
Machine learning (ML) is increasingly central to chemical discovery, yet most efforts remain confined to distributed and isolated research groups, limiting external validation and community engagement. Here, we introduce a generalizable mode of scientific outreach that couples a published study to a community-engaged test set, enabling post-publication evaluation by the broader ML community. This approach is demonstrated using a prior study on AI-guided discovery of photostable light-harvesting small molecules. After publishing an experimental dataset and in-house ML models, we leveraged automated block chemistry to synthesize nine additional light-harvesting molecules to serve as a blinded community test set. We then hosted an open Kaggle competition where we challenged the world community to outperform our best in-house predictive photostability model. In only one month, this competition received >700 submissions, including several innovative strategies that improved upon our previously published results. Given the success of this competition, we propose community-engaged test sets as a blueprint for post-publication benchmarking that democratizes access to high-quality experimental data, encourages innovative scientific engagement, and strengthens cross-disciplinary collaboration in the chemical sciences.
{"title":"Democratizing machine learning in chemistry with community-engaged test sets","authors":"Jason L. Wu, David M. Friday, Changhyun Hwang, Seungjoo Yi, Tiara C. Torres-Flores, Martin D. Burke, Ying Diao, Charles M. Schroeder and Nicholas E. Jackson","doi":"10.1039/D5DD00424A","DOIUrl":"https://doi.org/10.1039/D5DD00424A","url":null,"abstract":"<p >Machine learning (ML) is increasingly central to chemical discovery, yet most efforts remain confined to distributed and isolated research groups, limiting external validation and community engagement. Here, we introduce a generalizable mode of scientific outreach that couples a published study to a community-engaged test set, enabling post-publication evaluation by the broader ML community. This approach is demonstrated using a prior study on AI-guided discovery of photostable light-harvesting small molecules. After publishing an experimental dataset and in-house ML models, we leveraged automated block chemistry to synthesize nine additional light-harvesting molecules to serve as a blinded community test set. We then hosted an open Kaggle competition where we challenged the world community to outperform our best in-house predictive photostability model. In only one month, this competition received >700 submissions, including several innovative strategies that improved upon our previously published results. Given the success of this competition, we propose community-engaged test sets as a blueprint for post-publication benchmarking that democratizes access to high-quality experimental data, encourages innovative scientific engagement, and strengthens cross-disciplinary collaboration in the chemical sciences.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 304-309"},"PeriodicalIF":6.2,"publicationDate":"2025-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00424a?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146007004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bing Ma, Na Qin, Qianqian Yan, Wei Zhou, Sheng Zhang, Xiao Wang, Lipiao Bao and Xing Lu
Porous framework materials—including metal–organic frameworks (MOFs) and covalent organic frameworks (COFs)—have attracted widespread attention due to their high surface areas, tunable pore structures, and diverse functionalities, enabling promising applications in gas separation, catalysis, and energy storage. However, the vast chemical configuration space and the complexity of multi-parameter synthesis conditions pose significant challenges to the rational design and controlled synthesis of materials with targeted properties. In recent years, artificial intelligence (AI), particularly machine learning (ML) and deep learning (DL), in combination with multiscale molecular simulation methods such as density functional theory (DFT), grand canonical Monte Carlo (GCMC), and molecular dynamics (MD), has emerged as a powerful tool for accelerating the screening and optimization of framework materials. This review systematically summarizes AI-assisted strategies for framework material design, focusing on data-driven prediction of synthetic routes, optimization of reaction conditions, and inverse design targeting specific functionalities. We evaluate key AI models, including interpretable tree-based algorithms and neural networks capable of modeling complex structure–property relationships, and highlight their integration with atomistic simulations to enhance predictive accuracy. Furthermore, the synergy between AI and automated experimental platforms is advancing the development of high-throughput experimentation and self-optimizing workflows, often referred to as self-driving laboratories. Several case studies illustrate the effectiveness of AI methods in identifying high-performance framework materials and achieving morphology control, particularly when leveraging the integration of experimental and simulation data. The review also discusses key challenges in AI-assisted materials design, including inconsistent data quality, limited model interpretability, and the gap between prediction and practical synthesis. Looking ahead, the continued expansion of materials databases, advances in AI algorithms, and deeper integration of domain knowledge are expected to play an increasingly vital role in framework material development, driving a paradigm shift in materials research from empirical trial-and-error to more efficient, predictive, and intelligent design.
{"title":"Advancing metal organic framework and covalent organic framework design via the digital-intelligent paradigm","authors":"Bing Ma, Na Qin, Qianqian Yan, Wei Zhou, Sheng Zhang, Xiao Wang, Lipiao Bao and Xing Lu","doi":"10.1039/D5DD00401B","DOIUrl":"https://doi.org/10.1039/D5DD00401B","url":null,"abstract":"<p >Porous framework materials—including metal–organic frameworks (MOFs) and covalent organic frameworks (COFs)—have attracted widespread attention due to their high surface areas, tunable pore structures, and diverse functionalities, enabling promising applications in gas separation, catalysis, and energy storage. However, the vast chemical configuration space and the complexity of multi-parameter synthesis conditions pose significant challenges to the rational design and controlled synthesis of materials with targeted properties. In recent years, artificial intelligence (AI), particularly machine learning (ML) and deep learning (DL), in combination with multiscale molecular simulation methods such as density functional theory (DFT), grand canonical Monte Carlo (GCMC), and molecular dynamics (MD), has emerged as a powerful tool for accelerating the screening and optimization of framework materials. This review systematically summarizes AI-assisted strategies for framework material design, focusing on data-driven prediction of synthetic routes, optimization of reaction conditions, and inverse design targeting specific functionalities. We evaluate key AI models, including interpretable tree-based algorithms and neural networks capable of modeling complex structure–property relationships, and highlight their integration with atomistic simulations to enhance predictive accuracy. Furthermore, the synergy between AI and automated experimental platforms is advancing the development of high-throughput experimentation and self-optimizing workflows, often referred to as self-driving laboratories. Several case studies illustrate the effectiveness of AI methods in identifying high-performance framework materials and achieving morphology control, particularly when leveraging the integration of experimental and simulation data. The review also discusses key challenges in AI-assisted materials design, including inconsistent data quality, limited model interpretability, and the gap between prediction and practical synthesis. Looking ahead, the continued expansion of materials databases, advances in AI algorithms, and deeper integration of domain knowledge are expected to play an increasingly vital role in framework material development, driving a paradigm shift in materials research from empirical trial-and-error to more efficient, predictive, and intelligent design.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 2","pages":" 523-547"},"PeriodicalIF":6.2,"publicationDate":"2025-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00401b?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146211325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}