首页 > 最新文献

Digital discovery最新文献

英文 中文
Leveraging GPT-4 to transform chemistry from paper to practice† 利用 GPT-4 将化学从纸面转化为实践†。
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-10-03 DOI: 10.1039/D4DD00248B
Wenyu Zhang, Mason A. Guy, Jerrica Yang, Lucy Hao, Junliang Liu, Joel M. Hawkins, Jason Mustakis, Sebastien Monfette and Jason E. Hein

Large Language Models (LLMs) have revolutionized numerous industries as well as accelerated scientific research. However, their application in planning and conducting experimental science, has been limited. In this study, we introduce an adaptable prompt-set with GPT-4, converting literature experimental procedures into actionable experimental steps for a Mettler Toledo EasyMax automated laboratory reactor. Through prompt engineering, we developed a 2-step sequential prompt: the first prompt converts literature synthesis procedures into step-by-step instructions for reaction planning; the second prompt generates an XML script to communicate these instructions to the EasyMax reactor, automating experimental design and execution. We successfully automated the reproduction of three distinct literature-based synthetic procedures and validated the reactions by monitoring and characterizing the products. This approach bridges the gap between text-to-procedure transcription and automated execution, and streamlines literature procedure reproduction.

大型语言模型(LLM)给许多行业带来了革命性的变化,也加速了科学研究的发展。然而,它们在规划和开展实验科学方面的应用却很有限。在本研究中,我们采用 GPT-4 引入了一个可调整的提示集,将文献中的实验程序转换为梅特勒-托利多 EasyMax 自动实验室反应器的可操作实验步骤。通过提示工程,我们开发了一个两步顺序提示:第一步提示将文献合成程序转换为反应规划的分步说明;第二步提示生成一个 XML 脚本,将这些说明传达给 EasyMax 反应器,实现实验设计和执行的自动化。我们成功地自动复制了三种不同的基于文献的合成程序,并通过监测和表征产物验证了反应。这种方法缩小了文本到程序转录和自动执行之间的差距,简化了文献程序的复制。
{"title":"Leveraging GPT-4 to transform chemistry from paper to practice†","authors":"Wenyu Zhang, Mason A. Guy, Jerrica Yang, Lucy Hao, Junliang Liu, Joel M. Hawkins, Jason Mustakis, Sebastien Monfette and Jason E. Hein","doi":"10.1039/D4DD00248B","DOIUrl":"https://doi.org/10.1039/D4DD00248B","url":null,"abstract":"<p >Large Language Models (LLMs) have revolutionized numerous industries as well as accelerated scientific research. However, their application in planning and conducting experimental science, has been limited. In this study, we introduce an adaptable prompt-set with GPT-4, converting literature experimental procedures into actionable experimental steps for a Mettler Toledo EasyMax automated laboratory reactor. Through prompt engineering, we developed a 2-step sequential prompt: the first prompt converts literature synthesis procedures into step-by-step instructions for reaction planning; the second prompt generates an XML script to communicate these instructions to the EasyMax reactor, automating experimental design and execution. We successfully automated the reproduction of three distinct literature-based synthetic procedures and validated the reactions by monitoring and characterizing the products. This approach bridges the gap between text-to-procedure transcription and automated execution, and streamlines literature procedure reproduction.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 2367-2376"},"PeriodicalIF":6.2,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00248b?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AlabOS: a Python-based reconfigurable workflow management framework for autonomous laboratories AlabOS:基于 Python 的自主实验室可重构工作流程管理框架
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-10-03 DOI: 10.1039/D4DD00129J
Yuxing Fei, Bernardus Rendy, Rishi Kumar, Olympia Dartsi, Hrushikesh P. Sahasrabuddhe, Matthew J. McDermott, Zheren Wang, Nathan J. Szymanski, Lauren N. Walters, David Milsted, Yan Zeng, Anubhav Jain and Gerbrand Ceder

The recent advent of autonomous laboratories, coupled with algorithms for high-throughput screening and active learning, promises to accelerate materials discovery and innovation. As these autonomous systems grow in complexity, the demand for robust and efficient workflow management software becomes increasingly critical. In this paper, we introduce AlabOS, a general-purpose software framework for orchestrating experiments and managing resources, with an emphasis on automated laboratories for materials synthesis and characterization. AlabOS features a reconfigurable experiment workflow model and a resource reservation mechanism, enabling the simultaneous execution of varied workflows composed of modular tasks while eliminating conflicts between tasks. To showcase its capability, we demonstrate the implementation of AlabOS in a prototype autonomous materials laboratory, the A-Lab, with around 3500 samples synthesized over 1.5 years.

最近出现的自主实验室,加上高通量筛选和主动学习算法,有望加速材料发现和创新。随着这些自主系统的复杂性不断增加,对强大而高效的工作流管理软件的需求也变得越来越迫切。在本文中,我们将介绍 AlabOS,这是一个用于协调实验和管理资源的通用软件框架,重点是用于材料合成和表征的自动化实验室。AlabOS 具有可重新配置的实验工作流模型和资源预留机制,可同时执行由模块任务组成的各种工作流,同时消除任务之间的冲突。为了展示 AlabOS 的能力,我们在一个自主材料实验室原型(A-Lab)中演示了 AlabOS 的实施,在一年半的时间里合成了约 3500 个样品。
{"title":"AlabOS: a Python-based reconfigurable workflow management framework for autonomous laboratories","authors":"Yuxing Fei, Bernardus Rendy, Rishi Kumar, Olympia Dartsi, Hrushikesh P. Sahasrabuddhe, Matthew J. McDermott, Zheren Wang, Nathan J. Szymanski, Lauren N. Walters, David Milsted, Yan Zeng, Anubhav Jain and Gerbrand Ceder","doi":"10.1039/D4DD00129J","DOIUrl":"https://doi.org/10.1039/D4DD00129J","url":null,"abstract":"<p >The recent advent of autonomous laboratories, coupled with algorithms for high-throughput screening and active learning, promises to accelerate materials discovery and innovation. As these autonomous systems grow in complexity, the demand for robust and efficient workflow management software becomes increasingly critical. In this paper, we introduce AlabOS, a general-purpose software framework for orchestrating experiments and managing resources, with an emphasis on automated laboratories for materials synthesis and characterization. AlabOS features a reconfigurable experiment workflow model and a resource reservation mechanism, enabling the simultaneous execution of varied workflows composed of modular tasks while eliminating conflicts between tasks. To showcase its capability, we demonstrate the implementation of AlabOS in a prototype autonomous materials laboratory, the A-Lab, with around 3500 samples synthesized over 1.5 years.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 2275-2288"},"PeriodicalIF":6.2,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00129j?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data-driven exploration of silver nanoplate formation in multidimensional chemical design spaces† 多维化学设计空间中银纳米板形成的数据驱动探索†
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-10-02 DOI: 10.1039/D4DD00211C
Huat Thart Chiang, Kiran Vaddi and Lilo Pozzo

We present an autonomous data-driven framework that iteratively explores the experimental design space of silver nanoparticle synthesis to obtain control over the formation of a desired morphology and size. The objective of the method is to identify design rules such as the effects of the design variables on the structure of the nanoparticle. The framework balances multimodal characterization methods (i.e. UV-vis spectroscopy, SAXS, TEM), taking into account the cost of performing a measurement and the quality of information gained. By integrating with an AI agent, we identify important design variables in the synthesis of small colloidally stable plate-like silver particles and outline how each variable affects plate thickness, radius, polydispersity, and relative concentration. Our findings are consistent with the literature, demonstrating that the framework could be further applied to new systems that have not been well characterized and understood. The framework is generalizable and allows tangible knowledge extraction from the high-throughput experimental runs while still considering inherent stochasticity.

我们提出了一种自主数据驱动框架,该框架可迭代探索银纳米粒子合成的实验设计空间,以控制所需的形态和尺寸的形成。该方法的目标是确定设计规则,如设计变量对纳米粒子结构的影响。该框架平衡了多模态表征方法(即紫外-可见光谱、SAXS、TEM),同时考虑了进行测量的成本和所获信息的质量。通过与人工智能代理集成,我们确定了合成胶体稳定的板状银小颗粒的重要设计变量,并概述了每个变量如何影响板厚度、半径、多分散性和相对浓度。我们的研究结果与文献一致,表明该框架可进一步应用于尚未充分表征和理解的新系统。该框架具有通用性,可以从高通量实验运行中提取切实的知识,同时还考虑了固有的随机性。
{"title":"Data-driven exploration of silver nanoplate formation in multidimensional chemical design spaces†","authors":"Huat Thart Chiang, Kiran Vaddi and Lilo Pozzo","doi":"10.1039/D4DD00211C","DOIUrl":"https://doi.org/10.1039/D4DD00211C","url":null,"abstract":"<p >We present an autonomous data-driven framework that iteratively explores the experimental design space of silver nanoparticle synthesis to obtain control over the formation of a desired morphology and size. The objective of the method is to identify design rules such as the effects of the design variables on the structure of the nanoparticle. The framework balances multimodal characterization methods (<em>i.e.</em> UV-vis spectroscopy, SAXS, TEM), taking into account the cost of performing a measurement and the quality of information gained. By integrating with an AI agent, we identify important design variables in the synthesis of small colloidally stable plate-like silver particles and outline how each variable affects plate thickness, radius, polydispersity, and relative concentration. Our findings are consistent with the literature, demonstrating that the framework could be further applied to new systems that have not been well characterized and understood. The framework is generalizable and allows tangible knowledge extraction from the high-throughput experimental runs while still considering inherent stochasticity.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 2252-2264"},"PeriodicalIF":6.2,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00211c?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cost-informed Bayesian reaction optimization† 以成本为依据的贝叶斯反应优化。
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-10-01 DOI: 10.1039/D4DD00225C
Alexandre A. Schoepfer, Jan Weinreich, Ruben Laplaza, Jerome Waser and Clemence Corminboeuf

Bayesian optimization (BO) is an efficient method for solving complex optimization problems, including those in chemical research, where it is gaining significant popularity. Although effective in guiding experimental design, BO does not account for experimentation costs: testing readily available reagents under different conditions could be more cost and time-effective than synthesizing or buying additional ones. To address this issue, we present cost-informed BO (CIBO), an approach tailored for the rational planning of chemical experimentation that prioritizes the most cost-effective experiments. Reagents are used only when their anticipated improvement in reaction performance sufficiently outweighs their costs. Our algorithm tracks available reagents, including those recently acquired, and dynamically updates their cost during the optimization. Using literature data of Pd-catalyzed reactions, we show that CIBO reduces the cost of reaction optimization by up to 90% compared to standard BO. Our approach is compatible with any type of cost, e.g., of buying equipment or compounds, waiting time, as well as environmental or security concerns. We believe CIBO extends the possibilities of BO in chemistry and envision applications for both traditional and self-driving laboratories for experiment planning.

贝叶斯优化法(BO)是解决复杂优化问题的有效方法,在化学研究领域也越来越受欢迎。尽管贝叶斯优化法能有效指导实验设计,但它并不考虑实验成本:在不同条件下测试现成的试剂可能比合成或购买额外的试剂更节省成本和时间。为解决这一问题,我们提出了成本知情的 BO(CIBO),这是一种为合理规划化学实验而量身定制的方法,可优先考虑最具成本效益的实验。只有当试剂对反应性能的预期改善足以超过其成本时,才会使用。我们的算法跟踪可用试剂,包括最近获得的试剂,并在优化过程中动态更新其成本。通过使用钯催化反应的文献数据,我们发现与标准 BO 相比,CIBO 可将反应优化成本最多降低 90%。我们的方法与任何类型的成本兼容,例如购买设备或化合物的成本、等待时间以及环境或安全问题。我们相信,CIBO 拓展了 BO 在化学领域的应用前景,并设想将其应用于传统实验室和自动驾驶实验室的实验规划。
{"title":"Cost-informed Bayesian reaction optimization†","authors":"Alexandre A. Schoepfer, Jan Weinreich, Ruben Laplaza, Jerome Waser and Clemence Corminboeuf","doi":"10.1039/D4DD00225C","DOIUrl":"10.1039/D4DD00225C","url":null,"abstract":"<p >Bayesian optimization (BO) is an efficient method for solving complex optimization problems, including those in chemical research, where it is gaining significant popularity. Although effective in guiding experimental design, BO does not account for experimentation costs: testing readily available reagents under different conditions could be more cost and time-effective than synthesizing or buying additional ones. To address this issue, we present cost-informed BO (CIBO), an approach tailored for the rational planning of chemical experimentation that prioritizes the most cost-effective experiments. Reagents are used only when their anticipated improvement in reaction performance sufficiently outweighs their costs. Our algorithm tracks available reagents, including those recently acquired, and dynamically updates their cost during the optimization. Using literature data of Pd-catalyzed reactions, we show that CIBO reduces the cost of reaction optimization by up to 90% compared to standard BO. Our approach is compatible with any type of cost, <em>e.g.</em>, of buying equipment or compounds, waiting time, as well as environmental or security concerns. We believe CIBO extends the possibilities of BO in chemistry and envision applications for both traditional and self-driving laboratories for experiment planning.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 2289-2297"},"PeriodicalIF":6.2,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11465108/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142486074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine learning-assisted analysis of dry and lubricated tribological properties of Al–Co–Cr–Fe–Ni high entropy alloy 机器学习辅助分析 Al-Co-Cr-Fe-Ni 高熵合金的干燥和润滑摩擦学特性
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-09-30 DOI: 10.1039/D4DD00169A
Saurabh Vashistha, Bashista Kumar Mahanta, Vivek Kumar Singh, Neha Sharma, Anjan Ray, Saurabh Dixit and Shailesh Kumar Singh

This study marks a notable advancement in tribology by thoroughly investigating the tribological properties of a high-entropy alloy under both lubricated and dry conditions. The research encompasses a detailed evaluation of the alloy's wear behavior, utilizing a data-driven modeling approach that employs an evolutionary framework to build and validate a predictive model. The findings offer critical insights into the tribological performance of high-entropy alloys under diverse operational and lubrication conditions. Specifically, the Al–Co–Cr–Fe–Ni alloy exhibits exceptional tribological properties, with a coefficient of friction ranging from 0.0165 to 0.6024 and surface roughness between 0.261 and 1.11. A data-driven methodology was employed to develop a predictive model with an accuracy exceeding 94%, effectively capturing the precise trends in lubrication behavior and providing in-depth information on surface characteristics for future experimental endeavors and data extraction. Additionally, the study underscores the profound impact of lubricant chemical composition on the wear behavior of the alloy, highlighting the crucial importance of selecting appropriate lubricants for specific tribological applications.

这项研究通过深入研究一种高熵合金在润滑和干燥条件下的摩擦学特性,标志着摩擦学取得了显著进展。研究详细评估了合金的磨损行为,利用数据驱动的建模方法,采用进化框架建立并验证了预测模型。研究结果为了解高熵合金在不同操作和润滑条件下的摩擦学性能提供了重要依据。具体来说,Al-Co-Cr-Fe-Ni 合金表现出卓越的摩擦学性能,摩擦系数在 0.0165 到 0.6024 之间,表面粗糙度在 0.261 到 1.11 之间。研究采用数据驱动方法开发了一个准确率超过 94% 的预测模型,有效捕捉了润滑行为的精确趋势,并为未来的实验工作和数据提取提供了有关表面特征的深入信息。此外,研究还强调了润滑剂化学成分对合金磨损行为的深远影响,突出了为特定摩擦学应用选择适当润滑剂的重要性。
{"title":"Machine learning-assisted analysis of dry and lubricated tribological properties of Al–Co–Cr–Fe–Ni high entropy alloy","authors":"Saurabh Vashistha, Bashista Kumar Mahanta, Vivek Kumar Singh, Neha Sharma, Anjan Ray, Saurabh Dixit and Shailesh Kumar Singh","doi":"10.1039/D4DD00169A","DOIUrl":"https://doi.org/10.1039/D4DD00169A","url":null,"abstract":"<p >This study marks a notable advancement in tribology by thoroughly investigating the tribological properties of a high-entropy alloy under both lubricated and dry conditions. The research encompasses a detailed evaluation of the alloy's wear behavior, utilizing a data-driven modeling approach that employs an evolutionary framework to build and validate a predictive model. The findings offer critical insights into the tribological performance of high-entropy alloys under diverse operational and lubrication conditions. Specifically, the Al–Co–Cr–Fe–Ni alloy exhibits exceptional tribological properties, with a coefficient of friction ranging from 0.0165 to 0.6024 and surface roughness between 0.261 and 1.11. A data-driven methodology was employed to develop a predictive model with an accuracy exceeding 94%, effectively capturing the precise trends in lubrication behavior and providing in-depth information on surface characteristics for future experimental endeavors and data extraction. Additionally, the study underscores the profound impact of lubricant chemical composition on the wear behavior of the alloy, highlighting the crucial importance of selecting appropriate lubricants for specific tribological applications.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 2226-2241"},"PeriodicalIF":6.2,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00169a?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A comprehensive review of emerging approaches in machine learning for de novo PROTAC design 全面回顾机器学习中用于全新 PROTAC 设计的新兴方法
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-09-27 DOI: 10.1039/D4DD00177J
Yossra Gharbi and Rocío Mercado

Targeted protein degradation (TPD) is a rapidly growing field in modern drug discovery that aims to regulate the intracellular levels of proteins by harnessing the cell's innate degradation pathways to selectively target and degrade disease-related proteins. This strategy creates new opportunities for therapeutic intervention in cases where occupancy-based inhibitors have not been successful. Proteolysis-targeting chimeras (PROTACs) are at the heart of TPD strategies, which leverage the ubiquitin–proteasome system for the selective targeting and proteasomal degradation of pathogenic proteins. This unique mechanism can be particularly useful for dealing with proteins that were once deemed “undruggable” using conventional small-molecule drugs. PROTACs are hetero-bifunctional molecules consisting of two ligands, connected by a chemical linker. As the field evolves, it becomes increasingly apparent that traditional methodologies for designing such complex molecules have limitations. This has led to the use of machine learning (ML) and generative modeling to improve and accelerate the development process. In this review, we aim to provide a thorough exploration of the impact of ML on de novo PROTAC design – an aspect of molecular design that has not been comprehensively reviewed despite its significance. Initially, we delve into the distinct characteristics of PROTAC linker design, underscoring the complexities required to create effective bifunctional molecules capable of TPD. We then examine how ML in the context of fragment-based drug design (FBDD), honed in the realm of small-molecule drug discovery, is paving the way for PROTAC linker design. Our review provides a critical evaluation of the limitations inherent in applying this method to the complex field of PROTAC development. Moreover, we review existing ML works applied to PROTAC design, highlighting pioneering efforts and, importantly, the limitations these studies face. By offering insights into the current state of PROTAC development and the integral role of ML in PROTAC design, we aim to provide valuable perspectives for biologists, chemists, and ML practitioners alike in their pursuit of better design strategies for this new modality.

靶向蛋白质降解(TPD)是现代药物发现中一个快速发展的领域,其目的是通过利用细胞天生的降解途径来选择性地靶向和降解与疾病相关的蛋白质,从而调节细胞内的蛋白质水平。这种策略为治疗干预创造了新的机会,因为基于占位的抑制剂并不成功。蛋白水解靶向嵌合体(PROTACs)是TPD策略的核心,它利用泛素-蛋白酶体系统对致病蛋白进行选择性靶向和蛋白酶体降解。这种独特的机制对于处理曾被认为无法使用传统小分子药物的蛋白质特别有用。PROTACs 是由两种配体组成的异质双功能分子,通过化学连接体连接。随着这一领域的发展,设计这种复杂分子的传统方法越来越明显地存在局限性。因此,人们开始使用机器学习(ML)和生成模型来改进和加速开发过程。在这篇综述中,我们旨在深入探讨 ML 对全新 PROTAC 设计的影响--分子设计的一个方面尽管非常重要,但尚未得到全面的研究。首先,我们深入探讨了 PROTAC 连接器设计的显著特点,强调了创造能够进行 TPD 的有效双功能分子所需的复杂性。然后,我们研究了基于片段的药物设计 (FBDD) 背景下的 ML 是如何为 PROTAC 连接器设计铺平道路的。我们的综述对将这种方法应用于复杂的 PROTAC 开发领域所固有的局限性进行了批判性评估。此外,我们还回顾了应用于 PROTAC 设计的现有 ML 作品,强调了开创性的努力,以及这些研究面临的重要限制。通过深入了解 PROTAC 开发的现状以及 ML 在 PROTAC 设计中不可或缺的作用,我们旨在为生物学家、化学家和 ML 实践者提供有价值的观点,帮助他们为这种新模式寻求更好的设计策略。
{"title":"A comprehensive review of emerging approaches in machine learning for de novo PROTAC design","authors":"Yossra Gharbi and Rocío Mercado","doi":"10.1039/D4DD00177J","DOIUrl":"https://doi.org/10.1039/D4DD00177J","url":null,"abstract":"<p >Targeted protein degradation (TPD) is a rapidly growing field in modern drug discovery that aims to regulate the intracellular levels of proteins by harnessing the cell's innate degradation pathways to selectively target and degrade disease-related proteins. This strategy creates new opportunities for therapeutic intervention in cases where occupancy-based inhibitors have not been successful. Proteolysis-targeting chimeras (PROTACs) are at the heart of TPD strategies, which leverage the ubiquitin–proteasome system for the selective targeting and proteasomal degradation of pathogenic proteins. This unique mechanism can be particularly useful for dealing with proteins that were once deemed “undruggable” using conventional small-molecule drugs. PROTACs are hetero-bifunctional molecules consisting of two ligands, connected by a chemical linker. As the field evolves, it becomes increasingly apparent that traditional methodologies for designing such complex molecules have limitations. This has led to the use of machine learning (ML) and generative modeling to improve and accelerate the development process. In this review, we aim to provide a thorough exploration of the impact of ML on <em>de novo</em> PROTAC design – an aspect of molecular design that has not been comprehensively reviewed despite its significance. Initially, we delve into the distinct characteristics of PROTAC linker design, underscoring the complexities required to create effective bifunctional molecules capable of TPD. We then examine how ML in the context of fragment-based drug design (FBDD), honed in the realm of small-molecule drug discovery, is paving the way for PROTAC linker design. Our review provides a critical evaluation of the limitations inherent in applying this method to the complex field of PROTAC development. Moreover, we review existing ML works applied to PROTAC design, highlighting pioneering efforts and, importantly, the limitations these studies face. By offering insights into the current state of PROTAC development and the integral role of ML in PROTAC design, we aim to provide valuable perspectives for biologists, chemists, and ML practitioners alike in their pursuit of better design strategies for this new modality.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 2158-2176"},"PeriodicalIF":6.2,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00177j?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142594940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning material synthesis–process–structure–property relationship by data fusion: Bayesian co-regionalization N-dimensional piecewise function learning† 通过数据融合学习材料合成-工艺-结构-属性关系:贝叶斯共区域化 N 维片断函数学习†...
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-09-27 DOI: 10.1039/D4DD00048J
A. Gilad Kusne, Austin McDannald and Brian DeCost

Autonomous materials research labs require the ability to combine and learn from diverse data streams. This is especially true for learning material synthesis–process–structure–property relationships, key to accelerating materials optimization and discovery as well as accelerating mechanistic understanding. We present the Synthesis–process–structure–property relAtionship coreGionalized lEarner (SAGE) algorithm. A fully Bayesian algorithm that uses multimodal coregionalization and probability to merge knowledge across data sources into a unified model of synthesis–process–structure–property relationships. SAGE outputs a probabilistic posterior including the most likely relationship given the data along with proper uncertainty quantification. Beyond autonomous systems, SAGE will allow materials researchers to unify knowledge across their lab toward making better experiment design decisions.

自主材料研究实验室需要具备从不同数据流中进行组合和学习的能力。这对于学习材料合成-工艺-结构-属性关系尤其如此,而这是加速材料优化和发现以及加速机理理解的关键。我们提出了合成-过程-结构-属性关系核心化学习算法(SAGE)。这是一种全贝叶斯算法,利用多模态核心区域化和概率,将不同数据源的知识合并为一个统一的合成-过程-结构-属性关系模型。SAGE 可输出概率后验,包括数据中最有可能的关系,以及适当的不确定性量化。除了自主系统之外,SAGE 还能让材料研究人员统一整个实验室的知识,从而做出更好的实验设计决策。
{"title":"Learning material synthesis–process–structure–property relationship by data fusion: Bayesian co-regionalization N-dimensional piecewise function learning†","authors":"A. Gilad Kusne, Austin McDannald and Brian DeCost","doi":"10.1039/D4DD00048J","DOIUrl":"https://doi.org/10.1039/D4DD00048J","url":null,"abstract":"<p >Autonomous materials research labs require the ability to combine and learn from diverse data streams. This is especially true for learning material synthesis–process–structure–property relationships, key to accelerating materials optimization and discovery as well as accelerating mechanistic understanding. We present the Synthesis–process–structure–property relAtionship coreGionalized lEarner (SAGE) algorithm. A fully Bayesian algorithm that uses multimodal coregionalization and probability to merge knowledge across data sources into a unified model of synthesis–process–structure–property relationships. SAGE outputs a probabilistic posterior including the most likely relationship given the data along with proper uncertainty quantification. Beyond autonomous systems, SAGE will allow materials researchers to unify knowledge across their lab toward making better experiment design decisions.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 2211-2225"},"PeriodicalIF":6.2,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00048j?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AMPERE: automated modular platform for expedited and reproducible electrochemical testing† AMPERE:用于快速和可重复电化学测试的自动化模块平台†。
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-09-26 DOI: 10.1039/D4DD00203B
Jehad Abed, Yang Bai, Daniel Persaud, Jiheon Kim, Julia Witt, Jason Hattrick-Simpers and Edward H. Sargent

Rapid and reliable electrochemical screening is critical to accelerate the development of catalysts for sustainable energy generation and storage. This paper introduces an automated and modular platform for expedited and reproducible electrochemical testing (AMPERE), designed to enhance the efficiency and reliability of multivariate optimization. The platform integrates a liquid-handling robot with custom-made modular array reactors, offering sample preparation and electrochemical testing in the same platform. Additionally, we use offline inductively coupled plasma optical emission spectroscopy (ICP-OES) to measure metal concentrations in the electrolyte after the reaction, which serves as a proxy for assessing the electrochemical stability. We use the platform to conduct 168 experiments continuously in less than 40 hours to examine the influence of catalyst ink formulation on the performance of Ir, Ru, IrO2, and RuO2 for the oxygen evolution reaction (OER) in acid. We specifically investigate the role of solvent type and concentration, catalyst concentration, and binder content on the performance. We find that Ru/RuO2 catalysts show improvements in activity that are not directly linked to improvements in the electrochemical surface area or inversely correlated to Ru dissolution. This suggests a complex interplay between the catalytic performance of the drop-casted catalyst film and ink formulation. AMPERE simplifies catalyst preparation and testing at large scale, making it faster, more reliable, and accessible for widespread use.

快速可靠的电化学筛选对于加快可持续能源生产和储存催化剂的开发至关重要。本文介绍了一种用于快速和可重复电化学测试的自动化模块平台(AMPERE),旨在提高多元优化的效率和可靠性。该平台集成了液体处理机器人和定制的模块化阵列反应器,在同一平台上进行样品制备和电化学测试。此外,我们还使用离线电感耦合等离子体光发射光谱(ICP-OES)来测量反应后电解液中的金属浓度,以此来评估电化学稳定性。我们利用该平台在不到 40 小时的时间内连续进行了 168 次实验,以考察催化剂油墨配方对 Ir、Ru、IrO2 和 RuO2 在酸中进行氧进化反应(OER)的性能的影响。我们特别研究了溶剂类型和浓度、催化剂浓度和粘合剂含量对性能的影响。我们发现,Ru/RuO2 催化剂活性的提高并不直接与电化学表面积的提高相关,也不与 Ru 的溶解成反比。这表明滴铸催化剂薄膜的催化性能与油墨配方之间存在复杂的相互作用。AMPERE 简化了催化剂的制备和大规模测试,使其更快、更可靠、更易于广泛使用。
{"title":"AMPERE: automated modular platform for expedited and reproducible electrochemical testing†","authors":"Jehad Abed, Yang Bai, Daniel Persaud, Jiheon Kim, Julia Witt, Jason Hattrick-Simpers and Edward H. Sargent","doi":"10.1039/D4DD00203B","DOIUrl":"https://doi.org/10.1039/D4DD00203B","url":null,"abstract":"<p >Rapid and reliable electrochemical screening is critical to accelerate the development of catalysts for sustainable energy generation and storage. This paper introduces an automated and modular platform for expedited and reproducible electrochemical testing (AMPERE), designed to enhance the efficiency and reliability of multivariate optimization. The platform integrates a liquid-handling robot with custom-made modular array reactors, offering sample preparation and electrochemical testing in the same platform. Additionally, we use offline inductively coupled plasma optical emission spectroscopy (ICP-OES) to measure metal concentrations in the electrolyte after the reaction, which serves as a proxy for assessing the electrochemical stability. We use the platform to conduct 168 experiments continuously in less than 40 hours to examine the influence of catalyst ink formulation on the performance of Ir, Ru, IrO<small><sub>2</sub></small>, and RuO<small><sub>2</sub></small> for the oxygen evolution reaction (OER) in acid. We specifically investigate the role of solvent type and concentration, catalyst concentration, and binder content on the performance. We find that Ru/RuO<small><sub>2</sub></small> catalysts show improvements in activity that are not directly linked to improvements in the electrochemical surface area or inversely correlated to Ru dissolution. This suggests a complex interplay between the catalytic performance of the drop-casted catalyst film and ink formulation. AMPERE simplifies catalyst preparation and testing at large scale, making it faster, more reliable, and accessible for widespread use.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 2265-2274"},"PeriodicalIF":6.2,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00203b?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Composite machine learning strategy for natural products taxonomical classification and structural insights† 天然产品分类和结构洞察的复合机器学习策略†
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-09-23 DOI: 10.1039/D4DD00155A
Qisong Xu, Alan K. X. Tan, Liangfeng Guo, Yee Hwee Lim, Dillon W. P. Tay and Shi Jun Ang

Taxonomical classification of natural products (NPs) can assist in genomic and phylogenetic analysis of source organisms and facilitate streamlining of bioprospecting efforts. Here, a composite machine learning strategy marrying graph convolutional neural networks (GCNNs) and eXteme Gradient boosting (XGB) is proposed and validated for taxonomical classification of NPs in five kingdoms (Animalia, Bacteria, Chromista, Fungi, and Plantae). Our composite model, trained on 133 092 NPs from the LOTUS database, achieved five-fold cross-validated classification accuracy of 97.4%. When employed to classify out-of-sample NPs from the NP Atlas database, accuracies of 82.8% for bacteria and 86.6% for fungi were obtained. Dimensionality-reduced representations of the molecular embeddings from our composite model revealed distinct clusters of NPs that suggest a basis for enhanced classification performance. The top critical substructures from the NPs of each kingdom were also identified and compared to provide insights on structure–taxonomy relationships. Overall, this study showcases the potential of composite machine learning models for robust taxonomical classification of NPs, which can streamline discovery of NPs.

对天然产物(NPs)进行分类有助于对源生物进行基因组和系统发育分析,并有助于简化生物勘探工作。本文提出了一种将图卷积神经网络(GCNN)和梯度提升技术(XGB)结合起来的复合机器学习策略,并对五界(动物界、细菌界、染色体界、真菌界和植物界)的天然产物分类进行了验证。我们的复合模型是在 LOTUS 数据库的 133 092 个 NPs 上训练出来的,经过五倍交叉验证,分类准确率达到 97.4%。在对 NP Atlas 数据库中的样本外 NP 进行分类时,细菌和真菌的准确率分别为 82.8% 和 86.6%。我们的复合模型中分子嵌入的降维表示法揭示了NPs的独特群集,为提高分类性能提供了基础。此外,我们还识别并比较了每个生物界 NPs 中最重要的子结构,从而为结构-分类关系提供了深入的见解。总之,这项研究展示了复合机器学习模型在对 NPs 进行稳健分类方面的潜力,它可以简化 NPs 的发现过程。
{"title":"Composite machine learning strategy for natural products taxonomical classification and structural insights†","authors":"Qisong Xu, Alan K. X. Tan, Liangfeng Guo, Yee Hwee Lim, Dillon W. P. Tay and Shi Jun Ang","doi":"10.1039/D4DD00155A","DOIUrl":"https://doi.org/10.1039/D4DD00155A","url":null,"abstract":"<p >Taxonomical classification of natural products (NPs) can assist in genomic and phylogenetic analysis of source organisms and facilitate streamlining of bioprospecting efforts. Here, a composite machine learning strategy marrying graph convolutional neural networks (GCNNs) and eXteme Gradient boosting (XGB) is proposed and validated for taxonomical classification of NPs in five kingdoms (Animalia, Bacteria, Chromista, Fungi, and Plantae). Our composite model, trained on 133 092 NPs from the LOTUS database, achieved five-fold cross-validated classification accuracy of 97.4%. When employed to classify out-of-sample NPs from the NP Atlas database, accuracies of 82.8% for bacteria and 86.6% for fungi were obtained. Dimensionality-reduced representations of the molecular embeddings from our composite model revealed distinct clusters of NPs that suggest a basis for enhanced classification performance. The top critical substructures from the NPs of each kingdom were also identified and compared to provide insights on structure–taxonomy relationships. Overall, this study showcases the potential of composite machine learning models for robust taxonomical classification of NPs, which can streamline discovery of NPs.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 2192-2200"},"PeriodicalIF":6.2,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00155a?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142594943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Stability and transferability of machine learning force fields for molecular dynamics applications† 用于分子动力学应用的机器学习力场的稳定性和可转移性†。
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-09-21 DOI: 10.1039/D4DD00140K
Salatan Duangdangchote, Dwight S. Seferos and Oleksandr Voznyy

In this study, we focus on simplifying the generation of Machine Learning Force Fields (MLFFs) for Molecular Dynamics (MD) simulations of inorganic materials, with an emphasis on sustainable use of computational resources. We evaluate the efficiency and accuracy of existing state-of-the-art graph neural network (GNN) models and introduce new benchmarks that go beyond conventional mean absolute error on forces and energies. We showcase our methodology on the example of lithium-ion conductor materials, paving the way to a broader screening of ionic conductors for batteries and fuel cells.

在本研究中,我们将重点放在简化用于无机材料分子动力学(MD)模拟的机器学习力场(MLFF)的生成上,强调计算资源的可持续利用。我们评估了现有最先进的图神经网络(GNN)模型的效率和准确性,并引入了新的基准,超越了传统的力和能量平均绝对误差。我们以锂离子导体材料为例展示了我们的方法,为更广泛地筛选电池和燃料电池的离子导体铺平了道路。
{"title":"Stability and transferability of machine learning force fields for molecular dynamics applications†","authors":"Salatan Duangdangchote, Dwight S. Seferos and Oleksandr Voznyy","doi":"10.1039/D4DD00140K","DOIUrl":"https://doi.org/10.1039/D4DD00140K","url":null,"abstract":"<p >In this study, we focus on simplifying the generation of Machine Learning Force Fields (MLFFs) for Molecular Dynamics (MD) simulations of inorganic materials, with an emphasis on sustainable use of computational resources. We evaluate the efficiency and accuracy of existing state-of-the-art graph neural network (GNN) models and introduce new benchmarks that go beyond conventional mean absolute error on forces and energies. We showcase our methodology on the example of lithium-ion conductor materials, paving the way to a broader screening of ionic conductors for batteries and fuel cells.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 2177-2182"},"PeriodicalIF":6.2,"publicationDate":"2024-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00140k?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142594941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Digital discovery
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1