Minghua Zhu , Zijun Xiao , Tao Zhang , Guanghua Lu
{"title":"构建可解释的集合学习模型以预测鱼类体内有机化学品的生物累积参数","authors":"Minghua Zhu , Zijun Xiao , Tao Zhang , Guanghua Lu","doi":"10.1016/j.jhazmat.2024.136606","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate prediction of bioaccumulation parameters is essential for assessing exposure, hazards, and risks of chemicals. However, the majority of prediction models on bioaccumulation parameters are individual models based on a single algorithm and lack model interpretation, resulting in unsatisfactory prediction accuracy due to inherent constraints of the algorithm and weak interpretability. Ensemble learning (EL) that combine multiple algorithms, coupled with SHapley Additive exPlanation (SHAP) method, may overcome the limitations. Herein, EL models were constructed for three bioaccumulation parameters using datasets covering 2496 chemicals. The EL models demonstrated superior prediction accuracy compared to both individual models developed in this study and those from previous research, achieving a coefficient of determination of up to 0.861 on the validation sets. Applicability domains were characterized using a structure-activity landscape-based (abbreviated as AD<sub>SAL</sub>) methodology. The optimal EL models, together with the AD<sub>SAL</sub>, were successfully used to predict bioaccumulation parameters for 4374 chemicals included in the Inventory of Existing Chemical Substances of China. Model interpretation using the SHAP method offered insight into key features influencing bioaccumulation potential, including hydrophobicity, water solubility, polarizability, ionization potential, weight, and volume of molecules. Overall, the study provides data and models to support the sound management and risk assessment of chemicals.</div></div>","PeriodicalId":361,"journal":{"name":"Journal of Hazardous Materials","volume":"482 ","pages":"Article 136606"},"PeriodicalIF":12.2000,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Construction of interpretable ensemble learning models for predicting bioaccumulation parameters of organic chemicals in fish\",\"authors\":\"Minghua Zhu , Zijun Xiao , Tao Zhang , Guanghua Lu\",\"doi\":\"10.1016/j.jhazmat.2024.136606\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Accurate prediction of bioaccumulation parameters is essential for assessing exposure, hazards, and risks of chemicals. However, the majority of prediction models on bioaccumulation parameters are individual models based on a single algorithm and lack model interpretation, resulting in unsatisfactory prediction accuracy due to inherent constraints of the algorithm and weak interpretability. Ensemble learning (EL) that combine multiple algorithms, coupled with SHapley Additive exPlanation (SHAP) method, may overcome the limitations. Herein, EL models were constructed for three bioaccumulation parameters using datasets covering 2496 chemicals. The EL models demonstrated superior prediction accuracy compared to both individual models developed in this study and those from previous research, achieving a coefficient of determination of up to 0.861 on the validation sets. Applicability domains were characterized using a structure-activity landscape-based (abbreviated as AD<sub>SAL</sub>) methodology. The optimal EL models, together with the AD<sub>SAL</sub>, were successfully used to predict bioaccumulation parameters for 4374 chemicals included in the Inventory of Existing Chemical Substances of China. Model interpretation using the SHAP method offered insight into key features influencing bioaccumulation potential, including hydrophobicity, water solubility, polarizability, ionization potential, weight, and volume of molecules. Overall, the study provides data and models to support the sound management and risk assessment of chemicals.</div></div>\",\"PeriodicalId\":361,\"journal\":{\"name\":\"Journal of Hazardous Materials\",\"volume\":\"482 \",\"pages\":\"Article 136606\"},\"PeriodicalIF\":12.2000,\"publicationDate\":\"2024-11-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Hazardous Materials\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S030438942403187X\",\"RegionNum\":1,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ENVIRONMENTAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Hazardous Materials","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S030438942403187X","RegionNum":1,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ENVIRONMENTAL","Score":null,"Total":0}
引用次数: 0
摘要
准确预测生物累积参数对于评估化学品的暴露、危害和风险至关重要。然而,大多数生物累积参数预测模型都是基于单一算法的单个模型,缺乏模型解释,由于算法的固有限制和可解释性较弱,导致预测精度不尽人意。结合多种算法的集合学习(EL),再加上SHAPLE Additive exPlanation(SHAP)方法,可以克服上述局限性。在此,我们利用涵盖 2496 种化学品的数据集为三种生物累积参数构建了 EL 模型。与本研究开发的单个模型和以前研究的模型相比,EL 模型显示出更高的预测准确性,在验证集上的决定系数高达 0.861。采用基于结构-活性景观(简称 ADSAL)的方法对适用域进行了表征。最佳 EL 模型和 ADSAL 成功用于预测《中国现有化学物质名录》中 4,374 种化学物质的生物累积参数。利用 SHAP 方法对模型进行解释,可深入了解影响生物累积潜力的关键特征,包括疏水性、水溶性、极化性、电离电位、重量和分子体积。总之,这项研究为化学品的无害管理和风险评估提供了数据和模型支持。
Construction of interpretable ensemble learning models for predicting bioaccumulation parameters of organic chemicals in fish
Accurate prediction of bioaccumulation parameters is essential for assessing exposure, hazards, and risks of chemicals. However, the majority of prediction models on bioaccumulation parameters are individual models based on a single algorithm and lack model interpretation, resulting in unsatisfactory prediction accuracy due to inherent constraints of the algorithm and weak interpretability. Ensemble learning (EL) that combine multiple algorithms, coupled with SHapley Additive exPlanation (SHAP) method, may overcome the limitations. Herein, EL models were constructed for three bioaccumulation parameters using datasets covering 2496 chemicals. The EL models demonstrated superior prediction accuracy compared to both individual models developed in this study and those from previous research, achieving a coefficient of determination of up to 0.861 on the validation sets. Applicability domains were characterized using a structure-activity landscape-based (abbreviated as ADSAL) methodology. The optimal EL models, together with the ADSAL, were successfully used to predict bioaccumulation parameters for 4374 chemicals included in the Inventory of Existing Chemical Substances of China. Model interpretation using the SHAP method offered insight into key features influencing bioaccumulation potential, including hydrophobicity, water solubility, polarizability, ionization potential, weight, and volume of molecules. Overall, the study provides data and models to support the sound management and risk assessment of chemicals.
期刊介绍:
The Journal of Hazardous Materials serves as a global platform for promoting cutting-edge research in the field of Environmental Science and Engineering. Our publication features a wide range of articles, including full-length research papers, review articles, and perspectives, with the aim of enhancing our understanding of the dangers and risks associated with various materials concerning public health and the environment. It is important to note that the term "environmental contaminants" refers specifically to substances that pose hazardous effects through contamination, while excluding those that do not have such impacts on the environment or human health. Moreover, we emphasize the distinction between wastes and hazardous materials in order to provide further clarity on the scope of the journal. We have a keen interest in exploring specific compounds and microbial agents that have adverse effects on the environment.