Wenyu Zhang, Mason A. Guy, Jerrica Yang, Lucy Hao, Junliang Liu, Joel M. Hawkins, Jason Mustakis, Sebastien Monfette and Jason E. Hein
Large Language Models (LLMs) have revolutionized numerous industries as well as accelerated scientific research. However, their application in planning and conducting experimental science, has been limited. In this study, we introduce an adaptable prompt-set with GPT-4, converting literature experimental procedures into actionable experimental steps for a Mettler Toledo EasyMax automated laboratory reactor. Through prompt engineering, we developed a 2-step sequential prompt: the first prompt converts literature synthesis procedures into step-by-step instructions for reaction planning; the second prompt generates an XML script to communicate these instructions to the EasyMax reactor, automating experimental design and execution. We successfully automated the reproduction of three distinct literature-based synthetic procedures and validated the reactions by monitoring and characterizing the products. This approach bridges the gap between text-to-procedure transcription and automated execution, and streamlines literature procedure reproduction.
大型语言模型(LLM)给许多行业带来了革命性的变化,也加速了科学研究的发展。然而,它们在规划和开展实验科学方面的应用却很有限。在本研究中,我们采用 GPT-4 引入了一个可调整的提示集,将文献中的实验程序转换为梅特勒-托利多 EasyMax 自动实验室反应器的可操作实验步骤。通过提示工程,我们开发了一个两步顺序提示:第一步提示将文献合成程序转换为反应规划的分步说明;第二步提示生成一个 XML 脚本,将这些说明传达给 EasyMax 反应器,实现实验设计和执行的自动化。我们成功地自动复制了三种不同的基于文献的合成程序,并通过监测和表征产物验证了反应。这种方法缩小了文本到程序转录和自动执行之间的差距,简化了文献程序的复制。
{"title":"Leveraging GPT-4 to transform chemistry from paper to practice†","authors":"Wenyu Zhang, Mason A. Guy, Jerrica Yang, Lucy Hao, Junliang Liu, Joel M. Hawkins, Jason Mustakis, Sebastien Monfette and Jason E. Hein","doi":"10.1039/D4DD00248B","DOIUrl":"https://doi.org/10.1039/D4DD00248B","url":null,"abstract":"<p >Large Language Models (LLMs) have revolutionized numerous industries as well as accelerated scientific research. However, their application in planning and conducting experimental science, has been limited. In this study, we introduce an adaptable prompt-set with GPT-4, converting literature experimental procedures into actionable experimental steps for a Mettler Toledo EasyMax automated laboratory reactor. Through prompt engineering, we developed a 2-step sequential prompt: the first prompt converts literature synthesis procedures into step-by-step instructions for reaction planning; the second prompt generates an XML script to communicate these instructions to the EasyMax reactor, automating experimental design and execution. We successfully automated the reproduction of three distinct literature-based synthetic procedures and validated the reactions by monitoring and characterizing the products. This approach bridges the gap between text-to-procedure transcription and automated execution, and streamlines literature procedure reproduction.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 2367-2376"},"PeriodicalIF":6.2,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00248b?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuxing Fei, Bernardus Rendy, Rishi Kumar, Olympia Dartsi, Hrushikesh P. Sahasrabuddhe, Matthew J. McDermott, Zheren Wang, Nathan J. Szymanski, Lauren N. Walters, David Milsted, Yan Zeng, Anubhav Jain and Gerbrand Ceder
The recent advent of autonomous laboratories, coupled with algorithms for high-throughput screening and active learning, promises to accelerate materials discovery and innovation. As these autonomous systems grow in complexity, the demand for robust and efficient workflow management software becomes increasingly critical. In this paper, we introduce AlabOS, a general-purpose software framework for orchestrating experiments and managing resources, with an emphasis on automated laboratories for materials synthesis and characterization. AlabOS features a reconfigurable experiment workflow model and a resource reservation mechanism, enabling the simultaneous execution of varied workflows composed of modular tasks while eliminating conflicts between tasks. To showcase its capability, we demonstrate the implementation of AlabOS in a prototype autonomous materials laboratory, the A-Lab, with around 3500 samples synthesized over 1.5 years.
{"title":"AlabOS: a Python-based reconfigurable workflow management framework for autonomous laboratories","authors":"Yuxing Fei, Bernardus Rendy, Rishi Kumar, Olympia Dartsi, Hrushikesh P. Sahasrabuddhe, Matthew J. McDermott, Zheren Wang, Nathan J. Szymanski, Lauren N. Walters, David Milsted, Yan Zeng, Anubhav Jain and Gerbrand Ceder","doi":"10.1039/D4DD00129J","DOIUrl":"https://doi.org/10.1039/D4DD00129J","url":null,"abstract":"<p >The recent advent of autonomous laboratories, coupled with algorithms for high-throughput screening and active learning, promises to accelerate materials discovery and innovation. As these autonomous systems grow in complexity, the demand for robust and efficient workflow management software becomes increasingly critical. In this paper, we introduce AlabOS, a general-purpose software framework for orchestrating experiments and managing resources, with an emphasis on automated laboratories for materials synthesis and characterization. AlabOS features a reconfigurable experiment workflow model and a resource reservation mechanism, enabling the simultaneous execution of varied workflows composed of modular tasks while eliminating conflicts between tasks. To showcase its capability, we demonstrate the implementation of AlabOS in a prototype autonomous materials laboratory, the A-Lab, with around 3500 samples synthesized over 1.5 years.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 2275-2288"},"PeriodicalIF":6.2,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00129j?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present an autonomous data-driven framework that iteratively explores the experimental design space of silver nanoparticle synthesis to obtain control over the formation of a desired morphology and size. The objective of the method is to identify design rules such as the effects of the design variables on the structure of the nanoparticle. The framework balances multimodal characterization methods (i.e. UV-vis spectroscopy, SAXS, TEM), taking into account the cost of performing a measurement and the quality of information gained. By integrating with an AI agent, we identify important design variables in the synthesis of small colloidally stable plate-like silver particles and outline how each variable affects plate thickness, radius, polydispersity, and relative concentration. Our findings are consistent with the literature, demonstrating that the framework could be further applied to new systems that have not been well characterized and understood. The framework is generalizable and allows tangible knowledge extraction from the high-throughput experimental runs while still considering inherent stochasticity.
{"title":"Data-driven exploration of silver nanoplate formation in multidimensional chemical design spaces†","authors":"Huat Thart Chiang, Kiran Vaddi and Lilo Pozzo","doi":"10.1039/D4DD00211C","DOIUrl":"https://doi.org/10.1039/D4DD00211C","url":null,"abstract":"<p >We present an autonomous data-driven framework that iteratively explores the experimental design space of silver nanoparticle synthesis to obtain control over the formation of a desired morphology and size. The objective of the method is to identify design rules such as the effects of the design variables on the structure of the nanoparticle. The framework balances multimodal characterization methods (<em>i.e.</em> UV-vis spectroscopy, SAXS, TEM), taking into account the cost of performing a measurement and the quality of information gained. By integrating with an AI agent, we identify important design variables in the synthesis of small colloidally stable plate-like silver particles and outline how each variable affects plate thickness, radius, polydispersity, and relative concentration. Our findings are consistent with the literature, demonstrating that the framework could be further applied to new systems that have not been well characterized and understood. The framework is generalizable and allows tangible knowledge extraction from the high-throughput experimental runs while still considering inherent stochasticity.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 2252-2264"},"PeriodicalIF":6.2,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00211c?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexandre A. Schoepfer, Jan Weinreich, Ruben Laplaza, Jerome Waser and Clemence Corminboeuf
Bayesian optimization (BO) is an efficient method for solving complex optimization problems, including those in chemical research, where it is gaining significant popularity. Although effective in guiding experimental design, BO does not account for experimentation costs: testing readily available reagents under different conditions could be more cost and time-effective than synthesizing or buying additional ones. To address this issue, we present cost-informed BO (CIBO), an approach tailored for the rational planning of chemical experimentation that prioritizes the most cost-effective experiments. Reagents are used only when their anticipated improvement in reaction performance sufficiently outweighs their costs. Our algorithm tracks available reagents, including those recently acquired, and dynamically updates their cost during the optimization. Using literature data of Pd-catalyzed reactions, we show that CIBO reduces the cost of reaction optimization by up to 90% compared to standard BO. Our approach is compatible with any type of cost, e.g., of buying equipment or compounds, waiting time, as well as environmental or security concerns. We believe CIBO extends the possibilities of BO in chemistry and envision applications for both traditional and self-driving laboratories for experiment planning.
贝叶斯优化法(BO)是解决复杂优化问题的有效方法,在化学研究领域也越来越受欢迎。尽管贝叶斯优化法能有效指导实验设计,但它并不考虑实验成本:在不同条件下测试现成的试剂可能比合成或购买额外的试剂更节省成本和时间。为解决这一问题,我们提出了成本知情的 BO(CIBO),这是一种为合理规划化学实验而量身定制的方法,可优先考虑最具成本效益的实验。只有当试剂对反应性能的预期改善足以超过其成本时,才会使用。我们的算法跟踪可用试剂,包括最近获得的试剂,并在优化过程中动态更新其成本。通过使用钯催化反应的文献数据,我们发现与标准 BO 相比,CIBO 可将反应优化成本最多降低 90%。我们的方法与任何类型的成本兼容,例如购买设备或化合物的成本、等待时间以及环境或安全问题。我们相信,CIBO 拓展了 BO 在化学领域的应用前景,并设想将其应用于传统实验室和自动驾驶实验室的实验规划。
{"title":"Cost-informed Bayesian reaction optimization†","authors":"Alexandre A. Schoepfer, Jan Weinreich, Ruben Laplaza, Jerome Waser and Clemence Corminboeuf","doi":"10.1039/D4DD00225C","DOIUrl":"10.1039/D4DD00225C","url":null,"abstract":"<p >Bayesian optimization (BO) is an efficient method for solving complex optimization problems, including those in chemical research, where it is gaining significant popularity. Although effective in guiding experimental design, BO does not account for experimentation costs: testing readily available reagents under different conditions could be more cost and time-effective than synthesizing or buying additional ones. To address this issue, we present cost-informed BO (CIBO), an approach tailored for the rational planning of chemical experimentation that prioritizes the most cost-effective experiments. Reagents are used only when their anticipated improvement in reaction performance sufficiently outweighs their costs. Our algorithm tracks available reagents, including those recently acquired, and dynamically updates their cost during the optimization. Using literature data of Pd-catalyzed reactions, we show that CIBO reduces the cost of reaction optimization by up to 90% compared to standard BO. Our approach is compatible with any type of cost, <em>e.g.</em>, of buying equipment or compounds, waiting time, as well as environmental or security concerns. We believe CIBO extends the possibilities of BO in chemistry and envision applications for both traditional and self-driving laboratories for experiment planning.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 2289-2297"},"PeriodicalIF":6.2,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11465108/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142486074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Saurabh Vashistha, Bashista Kumar Mahanta, Vivek Kumar Singh, Neha Sharma, Anjan Ray, Saurabh Dixit and Shailesh Kumar Singh
This study marks a notable advancement in tribology by thoroughly investigating the tribological properties of a high-entropy alloy under both lubricated and dry conditions. The research encompasses a detailed evaluation of the alloy's wear behavior, utilizing a data-driven modeling approach that employs an evolutionary framework to build and validate a predictive model. The findings offer critical insights into the tribological performance of high-entropy alloys under diverse operational and lubrication conditions. Specifically, the Al–Co–Cr–Fe–Ni alloy exhibits exceptional tribological properties, with a coefficient of friction ranging from 0.0165 to 0.6024 and surface roughness between 0.261 and 1.11. A data-driven methodology was employed to develop a predictive model with an accuracy exceeding 94%, effectively capturing the precise trends in lubrication behavior and providing in-depth information on surface characteristics for future experimental endeavors and data extraction. Additionally, the study underscores the profound impact of lubricant chemical composition on the wear behavior of the alloy, highlighting the crucial importance of selecting appropriate lubricants for specific tribological applications.
{"title":"Machine learning-assisted analysis of dry and lubricated tribological properties of Al–Co–Cr–Fe–Ni high entropy alloy","authors":"Saurabh Vashistha, Bashista Kumar Mahanta, Vivek Kumar Singh, Neha Sharma, Anjan Ray, Saurabh Dixit and Shailesh Kumar Singh","doi":"10.1039/D4DD00169A","DOIUrl":"https://doi.org/10.1039/D4DD00169A","url":null,"abstract":"<p >This study marks a notable advancement in tribology by thoroughly investigating the tribological properties of a high-entropy alloy under both lubricated and dry conditions. The research encompasses a detailed evaluation of the alloy's wear behavior, utilizing a data-driven modeling approach that employs an evolutionary framework to build and validate a predictive model. The findings offer critical insights into the tribological performance of high-entropy alloys under diverse operational and lubrication conditions. Specifically, the Al–Co–Cr–Fe–Ni alloy exhibits exceptional tribological properties, with a coefficient of friction ranging from 0.0165 to 0.6024 and surface roughness between 0.261 and 1.11. A data-driven methodology was employed to develop a predictive model with an accuracy exceeding 94%, effectively capturing the precise trends in lubrication behavior and providing in-depth information on surface characteristics for future experimental endeavors and data extraction. Additionally, the study underscores the profound impact of lubricant chemical composition on the wear behavior of the alloy, highlighting the crucial importance of selecting appropriate lubricants for specific tribological applications.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 2226-2241"},"PeriodicalIF":6.2,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00169a?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Targeted protein degradation (TPD) is a rapidly growing field in modern drug discovery that aims to regulate the intracellular levels of proteins by harnessing the cell's innate degradation pathways to selectively target and degrade disease-related proteins. This strategy creates new opportunities for therapeutic intervention in cases where occupancy-based inhibitors have not been successful. Proteolysis-targeting chimeras (PROTACs) are at the heart of TPD strategies, which leverage the ubiquitin–proteasome system for the selective targeting and proteasomal degradation of pathogenic proteins. This unique mechanism can be particularly useful for dealing with proteins that were once deemed “undruggable” using conventional small-molecule drugs. PROTACs are hetero-bifunctional molecules consisting of two ligands, connected by a chemical linker. As the field evolves, it becomes increasingly apparent that traditional methodologies for designing such complex molecules have limitations. This has led to the use of machine learning (ML) and generative modeling to improve and accelerate the development process. In this review, we aim to provide a thorough exploration of the impact of ML on de novo PROTAC design – an aspect of molecular design that has not been comprehensively reviewed despite its significance. Initially, we delve into the distinct characteristics of PROTAC linker design, underscoring the complexities required to create effective bifunctional molecules capable of TPD. We then examine how ML in the context of fragment-based drug design (FBDD), honed in the realm of small-molecule drug discovery, is paving the way for PROTAC linker design. Our review provides a critical evaluation of the limitations inherent in applying this method to the complex field of PROTAC development. Moreover, we review existing ML works applied to PROTAC design, highlighting pioneering efforts and, importantly, the limitations these studies face. By offering insights into the current state of PROTAC development and the integral role of ML in PROTAC design, we aim to provide valuable perspectives for biologists, chemists, and ML practitioners alike in their pursuit of better design strategies for this new modality.
靶向蛋白质降解(TPD)是现代药物发现中一个快速发展的领域,其目的是通过利用细胞天生的降解途径来选择性地靶向和降解与疾病相关的蛋白质,从而调节细胞内的蛋白质水平。这种策略为治疗干预创造了新的机会,因为基于占位的抑制剂并不成功。蛋白水解靶向嵌合体(PROTACs)是TPD策略的核心,它利用泛素-蛋白酶体系统对致病蛋白进行选择性靶向和蛋白酶体降解。这种独特的机制对于处理曾被认为无法使用传统小分子药物的蛋白质特别有用。PROTACs 是由两种配体组成的异质双功能分子,通过化学连接体连接。随着这一领域的发展,设计这种复杂分子的传统方法越来越明显地存在局限性。因此,人们开始使用机器学习(ML)和生成模型来改进和加速开发过程。在这篇综述中,我们旨在深入探讨 ML 对全新 PROTAC 设计的影响--分子设计的一个方面尽管非常重要,但尚未得到全面的研究。首先,我们深入探讨了 PROTAC 连接器设计的显著特点,强调了创造能够进行 TPD 的有效双功能分子所需的复杂性。然后,我们研究了基于片段的药物设计 (FBDD) 背景下的 ML 是如何为 PROTAC 连接器设计铺平道路的。我们的综述对将这种方法应用于复杂的 PROTAC 开发领域所固有的局限性进行了批判性评估。此外,我们还回顾了应用于 PROTAC 设计的现有 ML 作品,强调了开创性的努力,以及这些研究面临的重要限制。通过深入了解 PROTAC 开发的现状以及 ML 在 PROTAC 设计中不可或缺的作用,我们旨在为生物学家、化学家和 ML 实践者提供有价值的观点,帮助他们为这种新模式寻求更好的设计策略。
{"title":"A comprehensive review of emerging approaches in machine learning for de novo PROTAC design","authors":"Yossra Gharbi and Rocío Mercado","doi":"10.1039/D4DD00177J","DOIUrl":"https://doi.org/10.1039/D4DD00177J","url":null,"abstract":"<p >Targeted protein degradation (TPD) is a rapidly growing field in modern drug discovery that aims to regulate the intracellular levels of proteins by harnessing the cell's innate degradation pathways to selectively target and degrade disease-related proteins. This strategy creates new opportunities for therapeutic intervention in cases where occupancy-based inhibitors have not been successful. Proteolysis-targeting chimeras (PROTACs) are at the heart of TPD strategies, which leverage the ubiquitin–proteasome system for the selective targeting and proteasomal degradation of pathogenic proteins. This unique mechanism can be particularly useful for dealing with proteins that were once deemed “undruggable” using conventional small-molecule drugs. PROTACs are hetero-bifunctional molecules consisting of two ligands, connected by a chemical linker. As the field evolves, it becomes increasingly apparent that traditional methodologies for designing such complex molecules have limitations. This has led to the use of machine learning (ML) and generative modeling to improve and accelerate the development process. In this review, we aim to provide a thorough exploration of the impact of ML on <em>de novo</em> PROTAC design – an aspect of molecular design that has not been comprehensively reviewed despite its significance. Initially, we delve into the distinct characteristics of PROTAC linker design, underscoring the complexities required to create effective bifunctional molecules capable of TPD. We then examine how ML in the context of fragment-based drug design (FBDD), honed in the realm of small-molecule drug discovery, is paving the way for PROTAC linker design. Our review provides a critical evaluation of the limitations inherent in applying this method to the complex field of PROTAC development. Moreover, we review existing ML works applied to PROTAC design, highlighting pioneering efforts and, importantly, the limitations these studies face. By offering insights into the current state of PROTAC development and the integral role of ML in PROTAC design, we aim to provide valuable perspectives for biologists, chemists, and ML practitioners alike in their pursuit of better design strategies for this new modality.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 2158-2176"},"PeriodicalIF":6.2,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00177j?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142594940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Autonomous materials research labs require the ability to combine and learn from diverse data streams. This is especially true for learning material synthesis–process–structure–property relationships, key to accelerating materials optimization and discovery as well as accelerating mechanistic understanding. We present the Synthesis–process–structure–property relAtionship coreGionalized lEarner (SAGE) algorithm. A fully Bayesian algorithm that uses multimodal coregionalization and probability to merge knowledge across data sources into a unified model of synthesis–process–structure–property relationships. SAGE outputs a probabilistic posterior including the most likely relationship given the data along with proper uncertainty quantification. Beyond autonomous systems, SAGE will allow materials researchers to unify knowledge across their lab toward making better experiment design decisions.
{"title":"Learning material synthesis–process–structure–property relationship by data fusion: Bayesian co-regionalization N-dimensional piecewise function learning†","authors":"A. Gilad Kusne, Austin McDannald and Brian DeCost","doi":"10.1039/D4DD00048J","DOIUrl":"https://doi.org/10.1039/D4DD00048J","url":null,"abstract":"<p >Autonomous materials research labs require the ability to combine and learn from diverse data streams. This is especially true for learning material synthesis–process–structure–property relationships, key to accelerating materials optimization and discovery as well as accelerating mechanistic understanding. We present the Synthesis–process–structure–property relAtionship coreGionalized lEarner (SAGE) algorithm. A fully Bayesian algorithm that uses multimodal coregionalization and probability to merge knowledge across data sources into a unified model of synthesis–process–structure–property relationships. SAGE outputs a probabilistic posterior including the most likely relationship given the data along with proper uncertainty quantification. Beyond autonomous systems, SAGE will allow materials researchers to unify knowledge across their lab toward making better experiment design decisions.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 2211-2225"},"PeriodicalIF":6.2,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00048j?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jehad Abed, Yang Bai, Daniel Persaud, Jiheon Kim, Julia Witt, Jason Hattrick-Simpers and Edward H. Sargent
Rapid and reliable electrochemical screening is critical to accelerate the development of catalysts for sustainable energy generation and storage. This paper introduces an automated and modular platform for expedited and reproducible electrochemical testing (AMPERE), designed to enhance the efficiency and reliability of multivariate optimization. The platform integrates a liquid-handling robot with custom-made modular array reactors, offering sample preparation and electrochemical testing in the same platform. Additionally, we use offline inductively coupled plasma optical emission spectroscopy (ICP-OES) to measure metal concentrations in the electrolyte after the reaction, which serves as a proxy for assessing the electrochemical stability. We use the platform to conduct 168 experiments continuously in less than 40 hours to examine the influence of catalyst ink formulation on the performance of Ir, Ru, IrO2, and RuO2 for the oxygen evolution reaction (OER) in acid. We specifically investigate the role of solvent type and concentration, catalyst concentration, and binder content on the performance. We find that Ru/RuO2 catalysts show improvements in activity that are not directly linked to improvements in the electrochemical surface area or inversely correlated to Ru dissolution. This suggests a complex interplay between the catalytic performance of the drop-casted catalyst film and ink formulation. AMPERE simplifies catalyst preparation and testing at large scale, making it faster, more reliable, and accessible for widespread use.
{"title":"AMPERE: automated modular platform for expedited and reproducible electrochemical testing†","authors":"Jehad Abed, Yang Bai, Daniel Persaud, Jiheon Kim, Julia Witt, Jason Hattrick-Simpers and Edward H. Sargent","doi":"10.1039/D4DD00203B","DOIUrl":"https://doi.org/10.1039/D4DD00203B","url":null,"abstract":"<p >Rapid and reliable electrochemical screening is critical to accelerate the development of catalysts for sustainable energy generation and storage. This paper introduces an automated and modular platform for expedited and reproducible electrochemical testing (AMPERE), designed to enhance the efficiency and reliability of multivariate optimization. The platform integrates a liquid-handling robot with custom-made modular array reactors, offering sample preparation and electrochemical testing in the same platform. Additionally, we use offline inductively coupled plasma optical emission spectroscopy (ICP-OES) to measure metal concentrations in the electrolyte after the reaction, which serves as a proxy for assessing the electrochemical stability. We use the platform to conduct 168 experiments continuously in less than 40 hours to examine the influence of catalyst ink formulation on the performance of Ir, Ru, IrO<small><sub>2</sub></small>, and RuO<small><sub>2</sub></small> for the oxygen evolution reaction (OER) in acid. We specifically investigate the role of solvent type and concentration, catalyst concentration, and binder content on the performance. We find that Ru/RuO<small><sub>2</sub></small> catalysts show improvements in activity that are not directly linked to improvements in the electrochemical surface area or inversely correlated to Ru dissolution. This suggests a complex interplay between the catalytic performance of the drop-casted catalyst film and ink formulation. AMPERE simplifies catalyst preparation and testing at large scale, making it faster, more reliable, and accessible for widespread use.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 2265-2274"},"PeriodicalIF":6.2,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00203b?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qisong Xu, Alan K. X. Tan, Liangfeng Guo, Yee Hwee Lim, Dillon W. P. Tay and Shi Jun Ang
Taxonomical classification of natural products (NPs) can assist in genomic and phylogenetic analysis of source organisms and facilitate streamlining of bioprospecting efforts. Here, a composite machine learning strategy marrying graph convolutional neural networks (GCNNs) and eXteme Gradient boosting (XGB) is proposed and validated for taxonomical classification of NPs in five kingdoms (Animalia, Bacteria, Chromista, Fungi, and Plantae). Our composite model, trained on 133 092 NPs from the LOTUS database, achieved five-fold cross-validated classification accuracy of 97.4%. When employed to classify out-of-sample NPs from the NP Atlas database, accuracies of 82.8% for bacteria and 86.6% for fungi were obtained. Dimensionality-reduced representations of the molecular embeddings from our composite model revealed distinct clusters of NPs that suggest a basis for enhanced classification performance. The top critical substructures from the NPs of each kingdom were also identified and compared to provide insights on structure–taxonomy relationships. Overall, this study showcases the potential of composite machine learning models for robust taxonomical classification of NPs, which can streamline discovery of NPs.
{"title":"Composite machine learning strategy for natural products taxonomical classification and structural insights†","authors":"Qisong Xu, Alan K. X. Tan, Liangfeng Guo, Yee Hwee Lim, Dillon W. P. Tay and Shi Jun Ang","doi":"10.1039/D4DD00155A","DOIUrl":"https://doi.org/10.1039/D4DD00155A","url":null,"abstract":"<p >Taxonomical classification of natural products (NPs) can assist in genomic and phylogenetic analysis of source organisms and facilitate streamlining of bioprospecting efforts. Here, a composite machine learning strategy marrying graph convolutional neural networks (GCNNs) and eXteme Gradient boosting (XGB) is proposed and validated for taxonomical classification of NPs in five kingdoms (Animalia, Bacteria, Chromista, Fungi, and Plantae). Our composite model, trained on 133 092 NPs from the LOTUS database, achieved five-fold cross-validated classification accuracy of 97.4%. When employed to classify out-of-sample NPs from the NP Atlas database, accuracies of 82.8% for bacteria and 86.6% for fungi were obtained. Dimensionality-reduced representations of the molecular embeddings from our composite model revealed distinct clusters of NPs that suggest a basis for enhanced classification performance. The top critical substructures from the NPs of each kingdom were also identified and compared to provide insights on structure–taxonomy relationships. Overall, this study showcases the potential of composite machine learning models for robust taxonomical classification of NPs, which can streamline discovery of NPs.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 2192-2200"},"PeriodicalIF":6.2,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00155a?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142594943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Salatan Duangdangchote, Dwight S. Seferos and Oleksandr Voznyy
In this study, we focus on simplifying the generation of Machine Learning Force Fields (MLFFs) for Molecular Dynamics (MD) simulations of inorganic materials, with an emphasis on sustainable use of computational resources. We evaluate the efficiency and accuracy of existing state-of-the-art graph neural network (GNN) models and introduce new benchmarks that go beyond conventional mean absolute error on forces and energies. We showcase our methodology on the example of lithium-ion conductor materials, paving the way to a broader screening of ionic conductors for batteries and fuel cells.
{"title":"Stability and transferability of machine learning force fields for molecular dynamics applications†","authors":"Salatan Duangdangchote, Dwight S. Seferos and Oleksandr Voznyy","doi":"10.1039/D4DD00140K","DOIUrl":"https://doi.org/10.1039/D4DD00140K","url":null,"abstract":"<p >In this study, we focus on simplifying the generation of Machine Learning Force Fields (MLFFs) for Molecular Dynamics (MD) simulations of inorganic materials, with an emphasis on sustainable use of computational resources. We evaluate the efficiency and accuracy of existing state-of-the-art graph neural network (GNN) models and introduce new benchmarks that go beyond conventional mean absolute error on forces and energies. We showcase our methodology on the example of lithium-ion conductor materials, paving the way to a broader screening of ionic conductors for batteries and fuel cells.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 2177-2182"},"PeriodicalIF":6.2,"publicationDate":"2024-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00140k?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142594941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}