Grace Amabel Tabaaza , Bennet Nii Tackie-Otoo , Dzulkarnain B. Zaini , Daniel Asante Otchere , Bhajan Lal
{"title":"利用VolSurf主要特性,应用机器学习模型预测离子液体的细胞毒性","authors":"Grace Amabel Tabaaza , Bennet Nii Tackie-Otoo , Dzulkarnain B. Zaini , Daniel Asante Otchere , Bhajan Lal","doi":"10.1016/j.comtox.2023.100266","DOIUrl":null,"url":null,"abstract":"<div><p>Ionic Liquids (ILs) are considered greener alternatives to traditional organic solvents due to their unique physical and chemical properties. Nevertheless, recent studies showed that ILs can induce toxic effects in ecosystem. Therefore, it is essential to determine the level of risk to the aquatic life to successfully use these ILs. Toxicity measurement of various ILs on a broad spectrum of conditions through experimental techniques is way demanding on time, resources, and is at times impractical. Various research works have been performed in Quantitative Property Relationship (QSAR/QSPR) for IL toxicity prediction expressed as EC50. In this study, five supervised machine learning models were trained and tested using nine Principal Properties (PPs) as descriptors to predict leukemia rat cell line (IPC-81) cytotoxicity. Then eight feature selection techniques were used to preprocess the data to improve the performance of the best machine learning model among the preliminary trained models. Analysis of the performance of the models on predicting the out-of-sample data set showed that the Extreme Gradient Boosting (XGBoost) supervised machine learning model is the best in predicting with the highest test score (R<sup>2</sup> = 0.79). This model was the most parsimonious (minimum AIC of 46.50), consistent (minimum RMSE of 0.45), and precise (minimum MAE of 0.32) in predicting IPC-81 cytotoxicity. The feature importance attribute of XGBoost confirmed that the structural features of ILs’ cation like cationic hydrophilicity and the side chain length have significant impact on the toxicity. Nevertheless, the anionic part of IL is also important to their toxicity and needs to be considered in toxicity prediction. Among the tested feature selection techniques, the random forest technique was the best in improving model performance (i.e., the least error matrices: AIC = 41.22, MAE = 0.31 and RMSE = 0.4259 respectively) but at longer execution time. However, the wrapper methods were the most robust in improving computational efficiency (i.e, improved the model performance at the shortest execution time). Therefore, this study improves QSPR studies on toxicity prediction of new ILs with the application of machine learning and feature selection techniques.</p></div>","PeriodicalId":37651,"journal":{"name":"Computational Toxicology","volume":null,"pages":null},"PeriodicalIF":3.1000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Application of machine learning models to predict cytotoxicity of ionic liquids using VolSurf principal properties\",\"authors\":\"Grace Amabel Tabaaza , Bennet Nii Tackie-Otoo , Dzulkarnain B. Zaini , Daniel Asante Otchere , Bhajan Lal\",\"doi\":\"10.1016/j.comtox.2023.100266\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Ionic Liquids (ILs) are considered greener alternatives to traditional organic solvents due to their unique physical and chemical properties. Nevertheless, recent studies showed that ILs can induce toxic effects in ecosystem. Therefore, it is essential to determine the level of risk to the aquatic life to successfully use these ILs. Toxicity measurement of various ILs on a broad spectrum of conditions through experimental techniques is way demanding on time, resources, and is at times impractical. Various research works have been performed in Quantitative Property Relationship (QSAR/QSPR) for IL toxicity prediction expressed as EC50. In this study, five supervised machine learning models were trained and tested using nine Principal Properties (PPs) as descriptors to predict leukemia rat cell line (IPC-81) cytotoxicity. Then eight feature selection techniques were used to preprocess the data to improve the performance of the best machine learning model among the preliminary trained models. Analysis of the performance of the models on predicting the out-of-sample data set showed that the Extreme Gradient Boosting (XGBoost) supervised machine learning model is the best in predicting with the highest test score (R<sup>2</sup> = 0.79). This model was the most parsimonious (minimum AIC of 46.50), consistent (minimum RMSE of 0.45), and precise (minimum MAE of 0.32) in predicting IPC-81 cytotoxicity. The feature importance attribute of XGBoost confirmed that the structural features of ILs’ cation like cationic hydrophilicity and the side chain length have significant impact on the toxicity. Nevertheless, the anionic part of IL is also important to their toxicity and needs to be considered in toxicity prediction. Among the tested feature selection techniques, the random forest technique was the best in improving model performance (i.e., the least error matrices: AIC = 41.22, MAE = 0.31 and RMSE = 0.4259 respectively) but at longer execution time. However, the wrapper methods were the most robust in improving computational efficiency (i.e, improved the model performance at the shortest execution time). Therefore, this study improves QSPR studies on toxicity prediction of new ILs with the application of machine learning and feature selection techniques.</p></div>\",\"PeriodicalId\":37651,\"journal\":{\"name\":\"Computational Toxicology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2023-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational Toxicology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2468111323000075\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"TOXICOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Toxicology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468111323000075","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"TOXICOLOGY","Score":null,"Total":0}
Application of machine learning models to predict cytotoxicity of ionic liquids using VolSurf principal properties
Ionic Liquids (ILs) are considered greener alternatives to traditional organic solvents due to their unique physical and chemical properties. Nevertheless, recent studies showed that ILs can induce toxic effects in ecosystem. Therefore, it is essential to determine the level of risk to the aquatic life to successfully use these ILs. Toxicity measurement of various ILs on a broad spectrum of conditions through experimental techniques is way demanding on time, resources, and is at times impractical. Various research works have been performed in Quantitative Property Relationship (QSAR/QSPR) for IL toxicity prediction expressed as EC50. In this study, five supervised machine learning models were trained and tested using nine Principal Properties (PPs) as descriptors to predict leukemia rat cell line (IPC-81) cytotoxicity. Then eight feature selection techniques were used to preprocess the data to improve the performance of the best machine learning model among the preliminary trained models. Analysis of the performance of the models on predicting the out-of-sample data set showed that the Extreme Gradient Boosting (XGBoost) supervised machine learning model is the best in predicting with the highest test score (R2 = 0.79). This model was the most parsimonious (minimum AIC of 46.50), consistent (minimum RMSE of 0.45), and precise (minimum MAE of 0.32) in predicting IPC-81 cytotoxicity. The feature importance attribute of XGBoost confirmed that the structural features of ILs’ cation like cationic hydrophilicity and the side chain length have significant impact on the toxicity. Nevertheless, the anionic part of IL is also important to their toxicity and needs to be considered in toxicity prediction. Among the tested feature selection techniques, the random forest technique was the best in improving model performance (i.e., the least error matrices: AIC = 41.22, MAE = 0.31 and RMSE = 0.4259 respectively) but at longer execution time. However, the wrapper methods were the most robust in improving computational efficiency (i.e, improved the model performance at the shortest execution time). Therefore, this study improves QSPR studies on toxicity prediction of new ILs with the application of machine learning and feature selection techniques.
期刊介绍:
Computational Toxicology is an international journal publishing computational approaches that assist in the toxicological evaluation of new and existing chemical substances assisting in their safety assessment. -All effects relating to human health and environmental toxicity and fate -Prediction of toxicity, metabolism, fate and physico-chemical properties -The development of models from read-across, (Q)SARs, PBPK, QIVIVE, Multi-Scale Models -Big Data in toxicology: integration, management, analysis -Implementation of models through AOPs, IATA, TTC -Regulatory acceptance of models: evaluation, verification and validation -From metals, to small organic molecules to nanoparticles -Pharmaceuticals, pesticides, foods, cosmetics, fine chemicals -Bringing together the views of industry, regulators, academia, NGOs