Development of a Predictive Model for N-Dealkylation of Amine Contaminants Based on Machine Learning Methods.

IF 3.9 3区 环境科学与生态学 Q2 ENVIRONMENTAL SCIENCES Toxics Pub Date : 2024-12-22 DOI:10.3390/toxics12120931
Shiyang Cheng, Qihang Zhang, Hao Min, Wenhui Jiang, Jueting Liu, Chunsheng Liu, Zehua Wang
{"title":"Development of a Predictive Model for N-Dealkylation of Amine Contaminants Based on Machine Learning Methods.","authors":"Shiyang Cheng, Qihang Zhang, Hao Min, Wenhui Jiang, Jueting Liu, Chunsheng Liu, Zehua Wang","doi":"10.3390/toxics12120931","DOIUrl":null,"url":null,"abstract":"<p><p>Amines are widespread environmental pollutants that may pose health risks. Specifically, the N-dealkylation of amines mediated by cytochrome P450 enzymes (P450) could influence their metabolic transformation safety. However, conventional experimental and computational chemistry methods make it difficult to conduct high-throughput screening of N-dealkylation of emerging amine contaminants. Machine learning has been widely used to identify sources of environmental pollutants and predict their toxicity. However, its application in screening critical biotransformation pathways for organic pollutants has been rarely reported. In this study, we first constructed a large dataset comprising 286 emerging amine pollutants through a thorough search of databases and literature. Then, we applied four machine learning methods-random forest, gradient boosting decision tree, extreme gradient boosting, and multi-layer perceptron-to develop binary classification models for N-dealkylation. These models were based on seven carefully selected molecular descriptors that represent reactivity-fit and structural-fit. Among the predictive models, the extreme gradient boosting shows the highest prediction accuracy of 81.0%. The SlogP_VSA2 descriptor is the primary factor influencing predictions of N-dealkylation metabolism. Then an ensemble model was generated that uses a consensus strategy to integrate three different algorithms, whose performance is generally better than any single algorithm, with an accuracy rate of 86.2%. Therefore, the classification model developed in this work can provide methodological support for the high-throughput screening of N-dealkylation of amine pollutants.</p>","PeriodicalId":23195,"journal":{"name":"Toxics","volume":"12 12","pages":""},"PeriodicalIF":3.9000,"publicationDate":"2024-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11728645/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Toxics","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.3390/toxics12120931","RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Amines are widespread environmental pollutants that may pose health risks. Specifically, the N-dealkylation of amines mediated by cytochrome P450 enzymes (P450) could influence their metabolic transformation safety. However, conventional experimental and computational chemistry methods make it difficult to conduct high-throughput screening of N-dealkylation of emerging amine contaminants. Machine learning has been widely used to identify sources of environmental pollutants and predict their toxicity. However, its application in screening critical biotransformation pathways for organic pollutants has been rarely reported. In this study, we first constructed a large dataset comprising 286 emerging amine pollutants through a thorough search of databases and literature. Then, we applied four machine learning methods-random forest, gradient boosting decision tree, extreme gradient boosting, and multi-layer perceptron-to develop binary classification models for N-dealkylation. These models were based on seven carefully selected molecular descriptors that represent reactivity-fit and structural-fit. Among the predictive models, the extreme gradient boosting shows the highest prediction accuracy of 81.0%. The SlogP_VSA2 descriptor is the primary factor influencing predictions of N-dealkylation metabolism. Then an ensemble model was generated that uses a consensus strategy to integrate three different algorithms, whose performance is generally better than any single algorithm, with an accuracy rate of 86.2%. Therefore, the classification model developed in this work can provide methodological support for the high-throughput screening of N-dealkylation of amine pollutants.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于机器学习方法开发胺类污染物 N-脱烷基化预测模型。
胺是广泛存在的环境污染物,可能对健康构成威胁。具体来说,细胞色素P450酶(P450)介导的胺的n -脱烷基反应会影响其代谢转化的安全性。然而,传统的实验和计算化学方法难以对新兴胺类污染物的n -脱烷基进行高通量筛选。机器学习已被广泛用于识别环境污染物的来源和预测其毒性。然而,其在筛选有机污染物关键生物转化途径中的应用鲜有报道。在本研究中,我们首先通过对数据库和文献的全面检索,构建了一个包含286种新出现的胺类污染物的大型数据集。然后,我们应用随机森林、梯度增强决策树、极端梯度增强和多层感知器四种机器学习方法建立了n-脱烷基的二元分类模型。这些模型是基于7个精心挑选的分子描述符,这些描述符代表了反应性匹配和结构匹配。在预测模型中,极端梯度增强模型的预测精度最高,达到81.0%。logp_vsa2描述子是影响n -脱烷基代谢预测的主要因素。然后生成了一个集成模型,该模型使用共识策略集成了三种不同的算法,其性能总体上优于任何单一算法,准确率为86.2%。因此,本研究建立的分类模型可为胺类污染物n -脱烷基的高通量筛选提供方法学支持。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Toxics
Toxics Chemical Engineering-Chemical Health and Safety
CiteScore
4.50
自引率
10.90%
发文量
681
审稿时长
6 weeks
期刊介绍: Toxics (ISSN 2305-6304) is an international, peer-reviewed, open access journal which provides an advanced forum for studies related to all aspects of toxic chemicals and materials. It publishes reviews, regular research papers, and short communications. Our aim is to encourage scientists to publish their experimental and theoretical results in detail. There is, therefore, no restriction on the maximum length of the papers, although authors should write their papers in a clear and concise way. The full experimental details must be provided so that the results can be reproduced. Electronic files or software regarding the full details of calculations and experimental procedure can be deposited as supplementary material, if it is not possible to publish them along with the text.
期刊最新文献
RETRACTED: Di Paola et al. Environmental Risk Assessment of Dexamethasone Sodium Phosphate and Tocilizumab Mixture in Zebrafish Early Life Stage (Danio rerio). Toxics 2022, 10, 279. RETRACTED: Paola et al. Environmental Impact of Pharmaceutical Pollutants: Synergistic Toxicity of Ivermectin and Cypermethrin. Toxics 2022, 10, 388. RETRACTED: Di Paola et al. Combined Effects of Potassium Perchlorate and a Neonicotinoid on Zebrafish Larvae (Danio rerio). Toxics 2022, 10, 203. Human Activity as a Growing Threat to Marine Ecosystems: Plastic and Temperature Effects on the Sponge Sarcotragus spinosulus. Subchronic Exposure to Low-Dose Chlorfenapyr and Emamectin Benzoate Disrupts Kidney Metabolism in Rats.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1