机器学习、概念密度泛函理论与生物化学的协同作用:芳香族胺类致突变性的无代码可解释预测模型。

IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Journal of Chemical Information and Modeling Pub Date : 2024-11-25 Epub Date: 2024-11-11 DOI:10.1021/acs.jcim.4c01246
Andrés Halabi Diaz, Mario Duque-Noreña, Elizabeth Rincón, Eduardo Chamorro
{"title":"机器学习、概念密度泛函理论与生物化学的协同作用:芳香族胺类致突变性的无代码可解释预测模型。","authors":"Andrés Halabi Diaz, Mario Duque-Noreña, Elizabeth Rincón, Eduardo Chamorro","doi":"10.1021/acs.jcim.4c01246","DOIUrl":null,"url":null,"abstract":"<p><p>This study synergizes machine learning (ML) with conceptual density functional theory (CDFT) to develop OECD-compliant predictive models for the mutagenic activity of aromatic amines (AAs) with a fully No-Code methodology using a comprehensive data set of 251 AAs, Leave-One-Out-Cross-Validation (LOOCV), and three distinct data splits. Our research employs the GFN2-xTB method, known for its robustness and speed, to compute descriptors for procarcinogens and their activated metabolites in vacuum and aqueous phases. We evaluate the effectiveness of different theoretical definitions of electrophilicity within CDFT, namely, PSL, GCV, and CDP schemes, and the newly introduced Log QP descriptor to approximate Log P information. SPAARC, RandomTree, and JCHAID* ML methods were used to build explainable predictive models with highly robust internal validation (Avg. Correct Classifications = 76% and Avg. Kappa = 0.29) and external validation (Avg. Correct Classifications = 79% and Avg. Kappa = 0.33) metrics, and the results were compared to those of a two hidden layer Multilayer Perceptron. The results indicate that the second CDP definition for the electrophilicity in both vacuum and aqueous phases and also the newly presented Log QP descriptors are the most important ones for predicting the mutagenic activity of AA (namely ω<sub>+Vac</sub><sup>CDP2+</sup>, ω<sub>+Aq</sub><sup>CDP2+</sup>, and LogQP1<sub>+Vac</sub>, respectively). The results indicate that metabolic activation, aqueous solvent properties, and the CDP electrophilicity schemes and Log QP should be considered when building predictive models for the mutagenic activity of AA. This study offers a replicable, No-Code approach to QSAR research, making high-level ML and CDFT applications accessible to a broader audience. Future work will expand these methods to other compound families, enhancing predictive capabilities in the study of mutagenic activities and other biological phenomena.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"8510-8520"},"PeriodicalIF":5.6000,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Synergizing Machine Learning, Conceptual Density Functional Theory, and Biochemistry: No-Code Explainable Predictive Models for Mutagenicity in Aromatic Amines.\",\"authors\":\"Andrés Halabi Diaz, Mario Duque-Noreña, Elizabeth Rincón, Eduardo Chamorro\",\"doi\":\"10.1021/acs.jcim.4c01246\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>This study synergizes machine learning (ML) with conceptual density functional theory (CDFT) to develop OECD-compliant predictive models for the mutagenic activity of aromatic amines (AAs) with a fully No-Code methodology using a comprehensive data set of 251 AAs, Leave-One-Out-Cross-Validation (LOOCV), and three distinct data splits. Our research employs the GFN2-xTB method, known for its robustness and speed, to compute descriptors for procarcinogens and their activated metabolites in vacuum and aqueous phases. We evaluate the effectiveness of different theoretical definitions of electrophilicity within CDFT, namely, PSL, GCV, and CDP schemes, and the newly introduced Log QP descriptor to approximate Log P information. SPAARC, RandomTree, and JCHAID* ML methods were used to build explainable predictive models with highly robust internal validation (Avg. Correct Classifications = 76% and Avg. Kappa = 0.29) and external validation (Avg. Correct Classifications = 79% and Avg. Kappa = 0.33) metrics, and the results were compared to those of a two hidden layer Multilayer Perceptron. The results indicate that the second CDP definition for the electrophilicity in both vacuum and aqueous phases and also the newly presented Log QP descriptors are the most important ones for predicting the mutagenic activity of AA (namely ω<sub>+Vac</sub><sup>CDP2+</sup>, ω<sub>+Aq</sub><sup>CDP2+</sup>, and LogQP1<sub>+Vac</sub>, respectively). The results indicate that metabolic activation, aqueous solvent properties, and the CDP electrophilicity schemes and Log QP should be considered when building predictive models for the mutagenic activity of AA. This study offers a replicable, No-Code approach to QSAR research, making high-level ML and CDFT applications accessible to a broader audience. Future work will expand these methods to other compound families, enhancing predictive capabilities in the study of mutagenic activities and other biological phenomena.</p>\",\"PeriodicalId\":44,\"journal\":{\"name\":\"Journal of Chemical Information and Modeling \",\"volume\":\" \",\"pages\":\"8510-8520\"},\"PeriodicalIF\":5.6000,\"publicationDate\":\"2024-11-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Chemical Information and Modeling \",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://doi.org/10.1021/acs.jcim.4c01246\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/11/11 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MEDICINAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Information and Modeling ","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.jcim.4c01246","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/11/11 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}
引用次数: 0

摘要

本研究将机器学习 (ML) 与概念密度泛函理论 (CDFT) 相结合,利用包含 251 种芳香胺 (AA) 的综合数据集、留空交叉验证 (LOOCV) 和三种不同的数据拆分,采用完全无代码方法开发出符合 OECD 标准的芳香胺 (AA) 诱变活性预测模型。我们的研究采用了以稳健性和快速性著称的 GFN2-xTB方法来计算真空相和水相中致癌物质及其活化代谢物的描述符。我们评估了 CDFT 中不同亲电性理论定义的有效性,即 PSL、GCV 和 CDP 方案,以及新引入的 Log QP 描述符以近似 Log P 信息。使用 SPAARC、RandomTree 和 JCHAID* ML 方法建立了可解释的预测模型,具有高度稳健的内部验证(平均正确分类率 = 76%,平均 Kappa = 0.29)和外部验证(平均正确分类率 = 79%,平均 Kappa = 0.33)指标,并将结果与双隐层多层感知器的结果进行了比较。结果表明,真空和水相亲电性的第二个 CDP 定义以及新提出的 Log QP 描述因子(分别为 ω+VacCDP2+、ω+AqCDP2+ 和 LogQP1+Vac)是预测 AA 诱变活性最重要的描述因子。结果表明,在建立 AA 诱变活性预测模型时,应考虑代谢活化、水溶剂特性、CDP 亲电方案和 Log QP。这项研究为 QSAR 研究提供了一种可复制的无代码方法,使更多的人可以使用高级 ML 和 CDFT 应用。未来的工作将把这些方法扩展到其他化合物家族,从而提高诱变活性和其他生物现象研究的预测能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Synergizing Machine Learning, Conceptual Density Functional Theory, and Biochemistry: No-Code Explainable Predictive Models for Mutagenicity in Aromatic Amines.

This study synergizes machine learning (ML) with conceptual density functional theory (CDFT) to develop OECD-compliant predictive models for the mutagenic activity of aromatic amines (AAs) with a fully No-Code methodology using a comprehensive data set of 251 AAs, Leave-One-Out-Cross-Validation (LOOCV), and three distinct data splits. Our research employs the GFN2-xTB method, known for its robustness and speed, to compute descriptors for procarcinogens and their activated metabolites in vacuum and aqueous phases. We evaluate the effectiveness of different theoretical definitions of electrophilicity within CDFT, namely, PSL, GCV, and CDP schemes, and the newly introduced Log QP descriptor to approximate Log P information. SPAARC, RandomTree, and JCHAID* ML methods were used to build explainable predictive models with highly robust internal validation (Avg. Correct Classifications = 76% and Avg. Kappa = 0.29) and external validation (Avg. Correct Classifications = 79% and Avg. Kappa = 0.33) metrics, and the results were compared to those of a two hidden layer Multilayer Perceptron. The results indicate that the second CDP definition for the electrophilicity in both vacuum and aqueous phases and also the newly presented Log QP descriptors are the most important ones for predicting the mutagenic activity of AA (namely ω+VacCDP2+, ω+AqCDP2+, and LogQP1+Vac, respectively). The results indicate that metabolic activation, aqueous solvent properties, and the CDP electrophilicity schemes and Log QP should be considered when building predictive models for the mutagenic activity of AA. This study offers a replicable, No-Code approach to QSAR research, making high-level ML and CDFT applications accessible to a broader audience. Future work will expand these methods to other compound families, enhancing predictive capabilities in the study of mutagenic activities and other biological phenomena.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
9.80
自引率
10.70%
发文量
529
审稿时长
1.4 months
期刊介绍: The Journal of Chemical Information and Modeling publishes papers reporting new methodology and/or important applications in the fields of chemical informatics and molecular modeling. Specific topics include the representation and computer-based searching of chemical databases, molecular modeling, computer-aided molecular design of new materials, catalysts, or ligands, development of new computational methods or efficient algorithms for chemical software, and biopharmaceutical chemistry including analyses of biological activity and other issues related to drug discovery. Astute chemists, computer scientists, and information specialists look to this monthly’s insightful research studies, programming innovations, and software reviews to keep current with advances in this integral, multidisciplinary field. As a subscriber you’ll stay abreast of database search systems, use of graph theory in chemical problems, substructure search systems, pattern recognition and clustering, analysis of chemical and physical data, molecular modeling, graphics and natural language interfaces, bibliometric and citation analysis, and synthesis design and reactions databases.
期刊最新文献
Essential Considerations for Free Energy Calculations of RNA-Small Molecule Complexes: Lessons from the Theophylline-Binding RNA Aptamer. MGT: Machine Learning Accelerates Performance Prediction of Alloy Catalytic Materials. Fatty Alcohol Membrane Model for Quantifying and Predicting Amphiphilicity. Multi-Peptide: Multimodality Leveraged Language-Graph Learning of Peptide Properties. EvaluationMaster: A GUI Tool for Structure-Based Virtual Screening Evaluation Analysis and Decision-Making Support.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1