首页 > 最新文献

Digital discovery最新文献

英文 中文
Digital chemistry: navigating the confluence of computation and experimentation – definition, status quo, and future perspective 数字化学:探索计算与实验的交汇点--定义、现状和未来展望
Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-08-19 DOI: 10.1039/d4dd00130c
Stefan Bräse
Digital chemistry represents a transformative approach integrating computational methods, digital data, and automation within the chemical sciences. It is defined by using digital toolkits and algorithms to simulate, predict, accelerate, and analyze chemical processes and properties, augmenting traditional experimental methods. The current status quo of digital chemistry is marked by rapid advancements in several key areas: high-throughput screening, machine learning models, quantum chemistry, and laboratory automation. These technologies have enabled unprecedented speeds in discovering and optimizing new molecules, materials, and reactions. Digital retrosynthesis and structure–active prediction tools have supported these endeavors. Furthermore, integrating large-language models and robotics in chemistry labs (e.g. demonstrated in self-driving labs) have begun to automate routine tasks and complex decision-making processes. Looking forward, the future of digital and digitalized chemistry is poised for significant growth, driven by the increasing accessibility of computational resources, the expansion of chemical databases, and the refinement of artificial intelligence algorithms. This evolution promises to accelerate innovation in drug discovery, materials science, and sustainable manufacturing, ultimately leading to more efficient, cost-effective, and environmentally friendly chemical research and production. The challenge lies in advancing the technology itself, fostering interdisciplinary collaboration, and ensuring the ethical use of digital tools in chemical research.
数字化学是一种变革性的方法,将计算方法、数字数据和自动化整合到化学科学中。其定义是利用数字工具包和算法来模拟、预测、加速和分析化学过程和性质,从而增强传统的实验方法。数字化学的现状以几个关键领域的快速发展为标志:高通量筛选、机器学习模型、量子化学和实验室自动化。这些技术以前所未有的速度发现和优化新分子、新材料和新反应。数字逆合成和结构活性预测工具为这些努力提供了支持。此外,在化学实验室中集成大型语言模型和机器人技术(例如在自动驾驶实验室中的应用),已开始实现常规任务和复杂决策过程的自动化。展望未来,在计算资源日益普及、化学数据库不断扩大以及人工智能算法日臻完善的推动下,数字化学和数字化化学的未来将实现大幅增长。这种演变有望加速药物发现、材料科学和可持续制造领域的创新,最终实现更高效、更具成本效益和更环保的化学研究与生产。我们面临的挑战在于推动技术本身的发展、促进跨学科合作,以及确保在化学研究中合乎道德地使用数字工具。
{"title":"Digital chemistry: navigating the confluence of computation and experimentation – definition, status quo, and future perspective","authors":"Stefan Bräse","doi":"10.1039/d4dd00130c","DOIUrl":"https://doi.org/10.1039/d4dd00130c","url":null,"abstract":"Digital chemistry represents a transformative approach integrating computational methods, digital data, and automation within the chemical sciences. It is defined by using digital toolkits and algorithms to simulate, predict, accelerate, and analyze chemical processes and properties, augmenting traditional experimental methods. The current status quo of digital chemistry is marked by rapid advancements in several key areas: high-throughput screening, machine learning models, quantum chemistry, and laboratory automation. These technologies have enabled unprecedented speeds in discovering and optimizing new molecules, materials, and reactions. Digital retrosynthesis and structure–active prediction tools have supported these endeavors. Furthermore, integrating large-language models and robotics in chemistry labs (<em>e.g.</em> demonstrated in self-driving labs) have begun to automate routine tasks and complex decision-making processes. Looking forward, the future of digital and digitalized chemistry is poised for significant growth, driven by the increasing accessibility of computational resources, the expansion of chemical databases, and the refinement of artificial intelligence algorithms. This evolution promises to accelerate innovation in drug discovery, materials science, and sustainable manufacturing, ultimately leading to more efficient, cost-effective, and environmentally friendly chemical research and production. The challenge lies in advancing the technology itself, fostering interdisciplinary collaboration, and ensuring the ethical use of digital tools in chemical research.","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards a science exocortex 建立科学外旋涡
Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-08-19 DOI: 10.1039/d4dd00178h
Kevin G. Yager
Artificial intelligence (AI) methods are poised to revolutionize intellectual work, with generative AI enabling automation of text analysis, text generation, and simple decision making or reasoning. The impact to science is only just beginning, but the opportunity is significant since scientific research relies fundamentally on extended chains of cognitive work. Here, we review the state of the art in agentic AI systems, and discuss how these methods could be extended to have even greater impact on science. We propose the development of an exocortex, a synthetic extension of a person's cognition. A science exocortex could be designed as a swarm of AI agents, with each agent individually streamlining specific researcher tasks, and whose inter-communication leads to emergent behavior that greatly extend the researcher's cognition and volition.
人工智能(AI)方法有望彻底改变智力工作,生成式人工智能可实现文本分析、文本生成和简单决策或推理的自动化。它对科学的影响才刚刚开始,但机遇巨大,因为科学研究从根本上依赖于认知工作的扩展链。在此,我们回顾了代理人工智能系统的技术现状,并讨论了如何扩展这些方法以对科学产生更大的影响。我们建议开发一种外核,即人的认知的合成扩展。科学外脑可以被设计成一个人工智能代理群,每个代理都可以单独简化研究人员的特定任务,它们之间的相互通信会产生新的行为,从而极大地扩展研究人员的认知和意志。
{"title":"Towards a science exocortex","authors":"Kevin G. Yager","doi":"10.1039/d4dd00178h","DOIUrl":"https://doi.org/10.1039/d4dd00178h","url":null,"abstract":"Artificial intelligence (AI) methods are poised to revolutionize intellectual work, with generative AI enabling automation of text analysis, text generation, and simple decision making or reasoning. The impact to science is only just beginning, but the opportunity is significant since scientific research relies fundamentally on extended chains of cognitive work. Here, we review the state of the art in agentic AI systems, and discuss how these methods could be extended to have even greater impact on science. We propose the development of an exocortex, a synthetic extension of a person's cognition. A science exocortex could be designed as a swarm of AI agents, with each agent individually streamlining specific researcher tasks, and whose inter-communication leads to emergent behavior that greatly extend the researcher's cognition and volition.","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine learning the screening factor in the soft bond valence approach for rapid crystal structure estimation 机器学习软键价方法中的筛选因子,实现快速晶体结构估算
Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-08-16 DOI: 10.1039/d4dd00152d
Keisuke Kameda, Takaaki Ariga, Kazuma Ito, Manabu Ihara, Sergei Manzhos
The development of novel functional ceramics is critically important for several applications, including the design of better electrochemical batteries and fuel cells, in particular solid oxide fuel cells. Computational prescreening and selection of such materials can help discover novel materials but is also challenging due to the high cost of electronic structure calculations which would be needed to compute the structures and properties of interest such as the material's stability and ion diffusion properties. The soft bond valence (SoftBV) approach is attractive for rapid prescreening among multiple compositions and structures, but the simplicity of the approximation can make the results inaccurate. In this study, we explore the possibility of enhancing the accuracy of the SoftBV approach when estimating crystal structures by adapting the parameters of the approximation to the chemical composition. Specifically, on the examples of perovskite- and spinel-type oxides that have been proposed as promising solid-state ionic conductors, the screening factor – an independent parameter of the SoftBV approximation – is modeled using linear and non-linear methods as a function of descriptors of the chemical composition. We find that making the screening factor a function of composition can noticeably improve the ability of the SoftBV approximation to correctly model structures, in particular new, putative crystal structures whose structural parameters are yet unknown. We also analyze the relative importance of nonlinearity and coupling in improving the model and find that while the quality of the model is improved by including nonlinearity, coupling is relatively unimportant. While using a neural network showed practically no improvement over linear regression, the recently proposed GPR-NN method that is a hybrid between a single hidden layer neural network and kernel regression showed substantial improvement, enabling the prediction of structural parameters of new ceramics with accuracy on the order of 1%.
新型功能陶瓷的开发对多种应用至关重要,包括设计更好的电化学电池和燃料电池,特别是固体氧化物燃料电池。对此类材料进行计算预筛选和选择有助于发现新型材料,但由于计算材料的稳定性和离子扩散特性等相关结构和特性需要进行高成本的电子结构计算,因此也具有挑战性。软键价(SoftBV)方法对于在多种成分和结构中进行快速预筛选很有吸引力,但近似的简单性可能会使结果不准确。在本研究中,我们探讨了在估计晶体结构时,通过根据化学成分调整近似值的参数来提高 SoftBV 方法准确性的可能性。具体来说,我们以被提出有望成为固态离子导体的包晶型和尖晶石型氧化物为例,使用线性和非线性方法将筛选因子(SoftBV 近似的独立参数)作为化学成分描述符的函数进行建模。我们发现,将屏蔽因子作为化学成分的函数可以明显改善 SoftBV 近似方法正确建立结构模型的能力,特别是那些结构参数尚不清楚的新的假定晶体结构。我们还分析了非线性和耦合在改进模型方面的相对重要性,发现虽然加入非线性可以提高模型质量,但耦合相对来说并不重要。与线性回归相比,使用神经网络几乎没有任何改进,而最近提出的 GPR-NN 方法(单隐层神经网络与核回归的混合方法)则有了大幅改进,使新型陶瓷结构参数的预测精度达到了 1%。
{"title":"Machine learning the screening factor in the soft bond valence approach for rapid crystal structure estimation","authors":"Keisuke Kameda, Takaaki Ariga, Kazuma Ito, Manabu Ihara, Sergei Manzhos","doi":"10.1039/d4dd00152d","DOIUrl":"https://doi.org/10.1039/d4dd00152d","url":null,"abstract":"The development of novel functional ceramics is critically important for several applications, including the design of better electrochemical batteries and fuel cells, in particular solid oxide fuel cells. Computational prescreening and selection of such materials can help discover novel materials but is also challenging due to the high cost of electronic structure calculations which would be needed to compute the structures and properties of interest such as the material's stability and ion diffusion properties. The soft bond valence (SoftBV) approach is attractive for rapid prescreening among multiple compositions and structures, but the simplicity of the approximation can make the results inaccurate. In this study, we explore the possibility of enhancing the accuracy of the SoftBV approach when estimating crystal structures by adapting the parameters of the approximation to the chemical composition. Specifically, on the examples of perovskite- and spinel-type oxides that have been proposed as promising solid-state ionic conductors, the screening factor – an independent parameter of the SoftBV approximation – is modeled using linear and non-linear methods as a function of descriptors of the chemical composition. We find that making the screening factor a function of composition can noticeably improve the ability of the SoftBV approximation to correctly model structures, in particular new, putative crystal structures whose structural parameters are yet unknown. We also analyze the relative importance of nonlinearity and coupling in improving the model and find that while the quality of the model is improved by including nonlinearity, coupling is relatively unimportant. While using a neural network showed practically no improvement over linear regression, the recently proposed GPR-NN method that is a hybrid between a single hidden layer neural network and kernel regression showed substantial improvement, enabling the prediction of structural parameters of new ceramics with accuracy on the order of 1%.","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142226347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Linear graphlet models for accurate and interpretable cheminformatics 线性小图模型用于准确和可解释的化学信息学
Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-08-16 DOI: 10.1039/d4dd00089g
Michael Tynes, Michael G. Taylor, Jan Janssen, Daniel J. Burrill, Danny Perez, Ping Yang, Nicholas Lubbers
Advances in machine learning have given rise to a plurality of data-driven methods for predicting chemical properties from molecular structure. For many decades, the cheminformatics field has relied heavily on structural fingerprinting, while in recent years much focus has shifted toward leveraging highly parameterized deep neural networks which usually maximize accuracy. Beyond accuracy, to be useful and trustworthy in scientific applications, machine learning techniques often need intuitive explanations for model predictions and uncertainty quantification techniques so a practitioner might know when a model is appropriate to apply to new data. Here we revisit graphlet histogram fingerprints and introduce several new elements. We show that linear models built on graphlet fingerprints attain accuracy that is competitive with the state of the art while retaining an explainability advantage over black-box approaches. We show how to produce precise explanations of predictions by exploiting the relationships between molecular graphlets and show that these explanations are consistent with chemical intuition, experimental measurements, and theoretical calculations. Finally, we show how to use the presence of unseen fragments in new molecules to adjust predictions and quantify uncertainty.
机器学习的进步催生了多种数据驱动型方法,用于根据分子结构预测化学性质。几十年来,化学信息学领域在很大程度上依赖于结构指纹识别,而近年来的重点则转向利用高度参数化的深度神经网络,这种网络通常能最大限度地提高准确性。除了准确性之外,机器学习技术要想在科学应用中发挥作用并值得信赖,通常还需要对模型预测和不确定性量化技术进行直观解释,这样从业人员才能知道什么时候适合将模型应用于新数据。在这里,我们重新审视了小图直方图指纹,并引入了几个新元素。我们的研究表明,基于小图指纹建立的线性模型在精度上可以与目前的技术水平相媲美,同时在可解释性上也比黑盒子方法更具优势。我们展示了如何利用分子小图之间的关系对预测做出精确解释,并证明这些解释与化学直觉、实验测量和理论计算相一致。最后,我们展示了如何利用新分子中未见片段的存在来调整预测和量化不确定性。
{"title":"Linear graphlet models for accurate and interpretable cheminformatics","authors":"Michael Tynes, Michael G. Taylor, Jan Janssen, Daniel J. Burrill, Danny Perez, Ping Yang, Nicholas Lubbers","doi":"10.1039/d4dd00089g","DOIUrl":"https://doi.org/10.1039/d4dd00089g","url":null,"abstract":"Advances in machine learning have given rise to a plurality of data-driven methods for predicting chemical properties from molecular structure. For many decades, the cheminformatics field has relied heavily on structural fingerprinting, while in recent years much focus has shifted toward leveraging highly parameterized deep neural networks which usually maximize accuracy. Beyond accuracy, to be useful and trustworthy in scientific applications, machine learning techniques often need intuitive explanations for model predictions and uncertainty quantification techniques so a practitioner might know when a model is appropriate to apply to new data. Here we revisit graphlet histogram fingerprints and introduce several new elements. We show that linear models built on graphlet fingerprints attain accuracy that is competitive with the state of the art while retaining an explainability advantage over black-box approaches. We show how to produce precise explanations of predictions by exploiting the relationships between molecular graphlets and show that these explanations are consistent with chemical intuition, experimental measurements, and theoretical calculations. Finally, we show how to use the presence of unseen fragments in new molecules to adjust predictions and quantify uncertainty.","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Digichem: computational chemistry for everyone† Digichem:面向所有人的计算化学†。
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-08-16 DOI: 10.1039/D4DD00147H
Oliver S. Lee, Malte C. Gather and Eli Zysman-Colman

We describe a new tool for the efficient management of computational chemistry. Digichem is a program that automates and simplifies nearly the entire computational pipeline, including large-scale batch submission of calculations, analysis and results parsing, the generation of 3D density plots and 2D graphs of calculation data, storage and retrieval of calculation results to a database, and automated handling of multi-step jobs. The program is designed to reduce the tedium and likelihood of human error for researchers of all skill-levels but is particularly targeted towards novice users who otherwise may find the barrier to entry to computational chemistry unnecessarily high. To date, this program has been used to successfully run and analyse over 50 000 individual calculations, evidencing its usefulness and utility. The Digichem program is presently released under a free-to-use license, and components of the Digichem system are additionally available under an open-source license.

我们介绍了一种高效管理计算化学的新工具。Digichem 是一个程序,它几乎自动简化了整个计算流程,包括大规模批量提交计算、分析和结果解析、生成计算数据的三维密度图和二维图形、将计算结果存储和检索到数据库,以及自动处理多步骤工作。该程序旨在为各种技能水平的研究人员减少乏味和人为错误的可能性,但特别针对新手用户,否则他们可能会发现进入计算化学的门槛过高。迄今为止,该程序已成功运行和分析了 50,000 多项计算,证明了它的实用性和有用性。Digichem 程序目前在免费使用许可证下发布,Digichem 系统的组件也在开源许可证下提供。
{"title":"Digichem: computational chemistry for everyone†","authors":"Oliver S. Lee, Malte C. Gather and Eli Zysman-Colman","doi":"10.1039/D4DD00147H","DOIUrl":"https://doi.org/10.1039/D4DD00147H","url":null,"abstract":"<p >We describe a new tool for the efficient management of computational chemistry. Digichem is a program that automates and simplifies nearly the entire computational pipeline, including large-scale batch submission of calculations, analysis and results parsing, the generation of 3D density plots and 2D graphs of calculation data, storage and retrieval of calculation results to a database, and automated handling of multi-step jobs. The program is designed to reduce the tedium and likelihood of human error for researchers of all skill-levels but is particularly targeted towards novice users who otherwise may find the barrier to entry to computational chemistry unnecessarily high. To date, this program has been used to successfully run and analyse over 50 000 individual calculations, evidencing its usefulness and utility. The Digichem program is presently released under a free-to-use license, and components of the Digichem system are additionally available under an open-source license.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00147h?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142169784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Every atom counts: predicting sites of reaction based on chemistry within two bonds† 每个原子都很重要:根据两个化学键内的化学反应预测反应场所†。
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-08-16 DOI: 10.1039/D4DD00092G
Ching Ching Lam and Jonathan M. Goodman

How much chemistry can be described by looking only at each atom, its neighbours and its next-nearest neighbours? We present a method for predicting reaction sites based only on a simple, two-bond model. Machine learning classification models were trained and evaluated using atom-level labels and descriptors, including bond strength and connectivity. Despite limitations in covering only local chemical environments, the models achieved over 80% accuracy even with challenging datasets that cover a diverse chemical space. Whilst this simplistic model is necessarily incomplete, it describes a large amount of interesting chemistry.

只看每个原子、其邻原子和近邻原子,能描述多少化学反应?我们介绍了一种仅基于简单双键模型预测反应场所的方法。我们使用原子级标签和描述符(包括键强度和连通性)对机器学习分类模型进行了训练和评估。尽管存在仅覆盖局部化学环境的局限性,但这些模型的准确率达到了 80% 以上,即使是在覆盖多种化学空间的挑战性数据集上也是如此。虽然这种简单化的模型必然是不完整的,但它描述了大量有趣的化学现象。
{"title":"Every atom counts: predicting sites of reaction based on chemistry within two bonds†","authors":"Ching Ching Lam and Jonathan M. Goodman","doi":"10.1039/D4DD00092G","DOIUrl":"https://doi.org/10.1039/D4DD00092G","url":null,"abstract":"<p >How much chemistry can be described by looking only at each atom, its neighbours and its next-nearest neighbours? We present a method for predicting reaction sites based only on a simple, two-bond model. Machine learning classification models were trained and evaluated using atom-level labels and descriptors, including bond strength and connectivity. Despite limitations in covering only local chemical environments, the models achieved over 80% accuracy even with challenging datasets that cover a diverse chemical space. Whilst this simplistic model is necessarily incomplete, it describes a large amount of interesting chemistry.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00092g?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142169801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Introduction to “Accelerate Conference 2022” 2022 年加速会议 "简介
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-08-15 DOI: 10.1039/D4DD90036G
Keith A. Brown, Fedwa El Mellouhi and Claudiane Ouellet-Plamondon

A graphical abstract is available for this content

本内容有图解摘要
{"title":"Introduction to “Accelerate Conference 2022”","authors":"Keith A. Brown, Fedwa El Mellouhi and Claudiane Ouellet-Plamondon","doi":"10.1039/D4DD90036G","DOIUrl":"https://doi.org/10.1039/D4DD90036G","url":null,"abstract":"<p >A graphical abstract is available for this content</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd90036g?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142169774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Insights into pharmacokinetic properties for exposure chemicals: predictive modelling of human plasma fraction unbound (fu) and hepatocyte intrinsic clearance (Clint) data using machine learning† 揭示暴露化学品的药代动力学特性:利用机器学习† 建立人体血浆非结合分数(fu)和肝细胞固有清除率(Clint)数据的预测模型
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-08-15 DOI: 10.1039/D4DD00082J
Souvik Pore and Kunal Roy

An external chemical substance (which may be a medicinal drug or an exposome), after ingestion, undergoes a series of dynamic movements and metabolic alterations known as pharmacokinetic events while exerting different physiological actions on the body (pharmacodynamics events). Plasma protein binding and hepatocyte intrinsic clearance are crucial pharmacokinetic events that influence the efficacy and safety of a chemical substance. Plasma protein binding determines the fraction of a chemical compound bound to plasma proteins, affecting the distribution and duration of action of the compound. The compounds with high protein binding may have a smaller free fraction available for pharmacological activity, potentially altering their therapeutic effects. On the other hand, hepatocyte intrinsic clearance represents the liver's capacity to eliminate a chemical compound through metabolism. It is a critical determinant of the elimination half-life of the chemical substance. Understanding hepatic clearance is essential for predicting chemical toxicity and designing safety guidelines. Recently, the huge expansion of computational resources has led to the development of various in silico models to generate predictive models as an alternative to animal experimentation. In this research work, we developed different types of machine learning (ML) based quantitative structure–activity relationship (QSAR) models for the prediction of the compound's plasma protein fraction unbound values and hepatocyte intrinsic clearance. Here, we have developed regression-based models with the protein fraction unbound (fu) human data set (n = 1812) and a classification-based model with the hepatocyte intrinsic clearance (Clint) human data set (n = 1241) collected from the recently published ICE (Integrated Chemical Environment) database. We have further analyzed the influence of the plasma protein binding on the hepatocyte intrinsic clearance, by considering the compounds having both types of target variable values. For the fraction unbound data set, the support vector machine (SVM) model shows superior results compared to other models, but for the hepatocyte intrinsic clearance data set, random forest (RF) shows the best results. We have further made predictions of these important pharmacokinetic parameters through the similarity-based read-across (RA) method. A Python-based tool for predicting the endpoints has been developed and made available from https://sites.google.com/jadavpuruniversity.in/dtc-lab-software/home/pkpy-tool.

外来化学物质(可能是药物或暴露体)摄入人体后,在对人体产生不同生理作用(药效学事件)的同时,会发生一系列被称为药代动力学事件的动态变化和代谢改变。血浆蛋白结合和肝细胞固有清除率是影响化学物质疗效和安全性的关键药代动力学事件。血浆蛋白结合率决定了化合物与血浆蛋白结合的比例,从而影响化合物的分布和作用时间。蛋白结合率高的化合物可用于药理活性的游离部分可能较小,从而可能改变其治疗效果。另一方面,肝细胞固有清除率代表肝脏通过新陈代谢消除化合物的能力。它是决定化学物质消除半衰期的关键因素。了解肝脏清除率对于预测化学毒性和设计安全指南至关重要。最近,随着计算资源的大幅扩展,人们开发出了各种硅学模型来生成预测模型,以替代动物实验。在这项研究工作中,我们开发了不同类型的基于机器学习(ML)的定量结构-活性关系(QSAR)模型,用于预测化合物的血浆蛋白部分未结合值和肝细胞固有清除率。在此,我们利用从最近发布的 ICE(集成化学环境)数据库中收集的未结合蛋白分数(fu)人类数据集(n = 1812)开发了基于回归的模型,并利用肝细胞固有清除率(Clint)人类数据集(n = 1241)开发了基于分类的模型。通过考虑两种类型目标变量值的化合物,我们进一步分析了血浆蛋白结合对肝细胞固有清除率的影响。对于非结合分数数据集,支持向量机(SVM)模型显示出优于其他模型的结果,但对于肝细胞固有清除率数据集,随机森林(RF)显示出最佳结果。我们还通过基于相似性的read-across(RA)方法进一步预测了这些重要的药代动力学参数。我们开发了一个基于 Python 的终点预测工具,可从 https://sites.google.com/jadavpuruniversity.in/dtc-lab-software/home/pkpy-tool 获取。
{"title":"Insights into pharmacokinetic properties for exposure chemicals: predictive modelling of human plasma fraction unbound (fu) and hepatocyte intrinsic clearance (Clint) data using machine learning†","authors":"Souvik Pore and Kunal Roy","doi":"10.1039/D4DD00082J","DOIUrl":"https://doi.org/10.1039/D4DD00082J","url":null,"abstract":"<p >An external chemical substance (which may be a medicinal drug or an exposome), after ingestion, undergoes a series of dynamic movements and metabolic alterations known as pharmacokinetic events while exerting different physiological actions on the body (pharmacodynamics events). Plasma protein binding and hepatocyte intrinsic clearance are crucial pharmacokinetic events that influence the efficacy and safety of a chemical substance. Plasma protein binding determines the fraction of a chemical compound bound to plasma proteins, affecting the distribution and duration of action of the compound. The compounds with high protein binding may have a smaller free fraction available for pharmacological activity, potentially altering their therapeutic effects. On the other hand, hepatocyte intrinsic clearance represents the liver's capacity to eliminate a chemical compound through metabolism. It is a critical determinant of the elimination half-life of the chemical substance. Understanding hepatic clearance is essential for predicting chemical toxicity and designing safety guidelines. Recently, the huge expansion of computational resources has led to the development of various <em>in silico</em> models to generate predictive models as an alternative to animal experimentation. In this research work, we developed different types of machine learning (ML) based quantitative structure–activity relationship (QSAR) models for the prediction of the compound's plasma protein fraction unbound values and hepatocyte intrinsic clearance. Here, we have developed regression-based models with the protein fraction unbound (<em>f</em><small><sub>u</sub></small>) human data set (<em>n</em> = 1812) and a classification-based model with the hepatocyte intrinsic clearance (Cl<small><sub>int</sub></small>) human data set (<em>n</em> = 1241) collected from the recently published ICE (Integrated Chemical Environment) database. We have further analyzed the influence of the plasma protein binding on the hepatocyte intrinsic clearance, by considering the compounds having both types of target variable values. For the fraction unbound data set, the support vector machine (SVM) model shows superior results compared to other models, but for the hepatocyte intrinsic clearance data set, random forest (RF) shows the best results. We have further made predictions of these important pharmacokinetic parameters through the similarity-based read-across (RA) method. A Python-based tool for predicting the endpoints has been developed and made available from https://sites.google.com/jadavpuruniversity.in/dtc-lab-software/home/pkpy-tool.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00082j?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142169800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dismai-Bench: benchmarking and designing generative models using disordered materials and interfaces† Dismai-Bench:使用无序材料和界面设计生成模型†。
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-08-15 DOI: 10.1039/D4DD00100A
Adrian Xiao Bin Yong, Tianyu Su and Elif Ertekin

Generative models have received significant attention in recent years for materials science applications, particularly in the area of inverse design for materials discovery. However, these models are usually assessed based on newly generated, unverified materials, using heuristic metrics such as charge neutrality, which provide a narrow evaluation of a model's performance. Also, current efforts for inorganic materials have predominantly focused on small, periodic crystals (≤20 atoms), even though the capability to generate large, more intricate and disordered structures would expand the applicability of generative modeling to a broader spectrum of materials. In this work, we present the Disordered Materials & Interfaces Benchmark (Dismai-Bench), a generative model benchmark that uses datasets of disordered alloys, interfaces, and amorphous silicon (256–264 atoms per structure). Models are trained on each dataset independently, and evaluated through direct structural comparisons between training and generated structures. Such comparisons are only possible because the material system of each training dataset is fixed. Benchmarking was performed on two graph diffusion models and two (coordinate-based) U-Net diffusion models. The graph models were found to significantly outperform the U-Net models due to the higher expressive power of graphs. While noise in the less expressive models can assist in discovering materials by facilitating exploration beyond the training distribution, these models face significant challenges when confronted with more complex structures. To further demonstrate the benefits of this benchmarking in the development process of a generative model, we considered the case of developing a point-cloud-based generative adversarial network (GAN) to generate low-energy disordered interfaces. We tested different GAN architectures and identified reasons for good/poor performance. We show that the best performing architecture, CryinGAN, outperforms the U-Net models, and is competitive against the graph models despite its lack of invariances and weaker expressive power. This work provides a new framework and insights to guide the development of future generative models, whether for ordered or disordered materials.

近年来,生成模型在材料科学应用领域,特别是在材料发现的逆向设计领域受到了极大关注。然而,对这些模型的评估通常是基于新生成的、未经验证的材料,使用启发式指标(如电荷中性),对模型性能的评估范围较窄。此外,目前针对无机材料的研究主要集中在小型、周期性晶体(≤20 个原子)上,尽管生成大型、更复杂和无序结构的能力会将生成模型的适用性扩展到更广泛的材料领域。在这项工作中,我们提出了无序材料与界面基准(Dismai-Bench),这是一个生成模型基准,使用无序合金、界面和非晶硅数据集(每个结构 256-264 个原子)。模型在每个数据集上独立训练,并通过训练结构和生成结构之间的直接结构比较进行评估。由于每个训练数据集的材料系统是固定的,因此这种比较才有可能进行。对两个图形扩散模型和两个(基于坐标的)U-Net 扩散模型进行了基准测试。结果发现,由于图形的表现力更强,图形模型明显优于 U-Net 模型。虽然表现力较弱的模型中的噪声可以通过促进对训练分布以外的探索来帮助发现材料,但这些模型在面对更复杂的结构时面临着巨大的挑战。为了进一步证明这种基准测试在生成模型开发过程中的益处,我们考虑了开发基于点云的生成对抗网络(GAN)以生成低能无序界面的案例。我们测试了不同的 GAN 架构,并找出了性能好/差的原因。我们发现,性能最好的架构 CryinGAN 优于 U-Net 模型,尽管它缺乏不变性,表现力也较弱,但与图模型相比仍具有竞争力。这项工作提供了一个新的框架和见解,可用于指导未来生成模型的开发,无论是有序材料还是无序材料。
{"title":"Dismai-Bench: benchmarking and designing generative models using disordered materials and interfaces†","authors":"Adrian Xiao Bin Yong, Tianyu Su and Elif Ertekin","doi":"10.1039/D4DD00100A","DOIUrl":"https://doi.org/10.1039/D4DD00100A","url":null,"abstract":"<p >Generative models have received significant attention in recent years for materials science applications, particularly in the area of inverse design for materials discovery. However, these models are usually assessed based on newly generated, unverified materials, using heuristic metrics such as charge neutrality, which provide a narrow evaluation of a model's performance. Also, current efforts for inorganic materials have predominantly focused on small, periodic crystals (≤20 atoms), even though the capability to generate large, more intricate and disordered structures would expand the applicability of generative modeling to a broader spectrum of materials. In this work, we present the Disordered Materials &amp; Interfaces Benchmark (Dismai-Bench), a generative model benchmark that uses datasets of disordered alloys, interfaces, and amorphous silicon (256–264 atoms per structure). Models are trained on each dataset independently, and evaluated through direct structural comparisons between training and generated structures. Such comparisons are only possible because the material system of each training dataset is fixed. Benchmarking was performed on two graph diffusion models and two (coordinate-based) U-Net diffusion models. The graph models were found to significantly outperform the U-Net models due to the higher expressive power of graphs. While noise in the less expressive models can assist in discovering materials by facilitating exploration beyond the training distribution, these models face significant challenges when confronted with more complex structures. To further demonstrate the benefits of this benchmarking in the development process of a generative model, we considered the case of developing a point-cloud-based generative adversarial network (GAN) to generate low-energy disordered interfaces. We tested different GAN architectures and identified reasons for good/poor performance. We show that the best performing architecture, CryinGAN, outperforms the U-Net models, and is competitive against the graph models despite its lack of invariances and weaker expressive power. This work provides a new framework and insights to guide the development of future generative models, whether for ordered or disordered materials.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00100a?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142169802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Active learning for regression of structure-property mapping: the importance of sampling and representation 结构-属性映射回归的主动学习:取样和表征的重要性
Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-08-12 DOI: 10.1039/d4dd00073k
Hao Liu, Berkay Yucel, Baskar Ganapathysubramanian, Surya R Kalidindi, Daniel Wheeler, Olga Wodo
Data-driven approaches now allow for systematic mappings from materials microstructures to materials properties. In particular, diverse data-driven approaches are available to establish mappings using varied microstructure representations, each posing different demands on the resources required to calibrate machine learning models. In this work, using active learning regression and iteratively increasing the data pool, three questions are explored: (a) What is the minimal subset of data required to train a predictive structure-property model with sufficient accuracy? (b) Is this minimal subset highly dependent on the sampling strategy managing the datapool? And (c) what is the cost associated with the model calibration? Using case studies with different types of microstructure (composite vs spinodal), dimensionality (two- and three-dimensional), and properties (elastic and electronic), two separate microstructure representations are evaluated: graph-based descriptors derived from a graph representation of the microstructure and two-point correlation functions. This work demonstrates that as few as 5 % of evaluations are required to calibrate robust data-driven structure-property maps when selections are made from a library of diverse microstructures. The findings show that both representations (graph-based descriptors and two-point correlation functions) can be effective with only a small quantity of property evaluations when combined with different active learning strategies. However, the dimensionality of the latent space differs substantially depending on the microstructure representation and active learning strategy.
目前,数据驱动方法可实现从材料微观结构到材料特性的系统映射。特别是,有多种数据驱动方法可用于使用不同的微观结构表示法建立映射,每种方法都对校准机器学习模型所需的资源提出了不同的要求。在这项工作中,我们利用主动学习回归和迭代增加数据池的方法,探索了三个问题:(a) 以足够的准确性训练预测性结构-性能模型所需的最小数据子集是什么?(b) 这个最小子集是否高度依赖于管理数据池的采样策略?(c) 模型校准的相关成本是多少?通过对不同类型的微观结构(复合微观结构与尖晶石微观结构)、维度(二维与三维)和属性(弹性与电子)进行案例研究,评估了两种不同的微观结构表示方法:从微观结构图表示法和两点相关函数中得出的基于图形的描述符。这项研究表明,从不同的微观结构库中进行选择时,只需进行 5% 的评估即可校准稳健的数据驱动结构-属性图。研究结果表明,这两种表征(基于图形的描述符和两点相关函数)在与不同的主动学习策略相结合时,只需少量的属性评估就能产生效果。然而,根据微观结构表示法和主动学习策略的不同,潜在空间的维度也大不相同。
{"title":"Active learning for regression of structure-property mapping: the importance of sampling and representation","authors":"Hao Liu, Berkay Yucel, Baskar Ganapathysubramanian, Surya R Kalidindi, Daniel Wheeler, Olga Wodo","doi":"10.1039/d4dd00073k","DOIUrl":"https://doi.org/10.1039/d4dd00073k","url":null,"abstract":"Data-driven approaches now allow for systematic mappings from materials microstructures to materials properties. In particular, diverse data-driven approaches are available to establish mappings using varied microstructure representations, each posing different demands on the resources required to calibrate machine learning models. In this work, using active learning regression and iteratively increasing the data pool, three questions are explored: (a) What is the minimal subset of data required to train a predictive structure-property model with sufficient accuracy? (b) Is this minimal subset highly dependent on the sampling strategy managing the datapool? And (c) what is the cost associated with the model calibration? Using case studies with different types of microstructure (composite vs spinodal), dimensionality (two- and three-dimensional), and properties (elastic and electronic), two separate microstructure representations are evaluated: graph-based descriptors derived from a graph representation of the microstructure and two-point correlation functions. This work demonstrates that as few as 5 % of evaluations are required to calibrate robust data-driven structure-property maps when selections are made from a library of diverse microstructures. The findings show that both representations (graph-based descriptors and two-point correlation functions) can be effective with only a small quantity of property evaluations when combined with different active learning strategies. However, the dimensionality of the latent space differs substantially depending on the microstructure representation and active learning strategy.","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141945904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Digital discovery
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1