{"title":"给编辑的信:对潘等人的《算法与土壤的对话:机器学习揭开土壤中邻苯二甲酸盐污染之谜》(2025)的评论","authors":"Souichi Oka, Yoshiyasu Takefuji","doi":"10.1016/j.jhazmat.2025.138366","DOIUrl":null,"url":null,"abstract":"<div><div>Pan et al. demonstrated the superior predictive performance of their machine learning ML models for soil phthalate PAE concentrations, highlighting the critical role of feature importance as assessed by SHapley Additive exPlanations (SHAP). Notably, the Multilayer Perceptron (MLP) model achieved the highest performance (R² = 0.8637), followed by SVR and XGBoost. However, concerns persist regarding the reliability of feature importance derived from these models and their SHAP interpretations. Specifically, predictive accuracy does not guarantee the validity of feature rankings due to the inherent biases present in tree-based, neural network, and kernel-based methods, which are further exacerbated by SHAP's inherent dependency on model outputs. To mitigate these biases, integrating robust statistical methods is crucial. Techniques such as Spearman's rho, Kendall's tau, Goodman-Kruskal's gamma, Somers' delta, and Hoeffding's dependence, combined with p-value analysis, offer unbiased assessments. Integrating these statistical methods alongside ML models ensures a more reliable evaluation of feature importance in environmental risk modeling. Consequently, future research should prioritize methodologies that combine ML with rigorous statistical validation to enhance accuracy and reduce biases.</div></div>","PeriodicalId":361,"journal":{"name":"Journal of Hazardous Materials","volume":"493 ","pages":"Article 138366"},"PeriodicalIF":10.6000,"publicationDate":"2025-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comments on \\\"Dialogue between algorithms and soil: Machine learning unravels the mystery of phthalates pollution in soil\\\" by Pan et al. (2025)\",\"authors\":\"Souichi Oka, Yoshiyasu Takefuji\",\"doi\":\"10.1016/j.jhazmat.2025.138366\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Pan et al. demonstrated the superior predictive performance of their machine learning ML models for soil phthalate PAE concentrations, highlighting the critical role of feature importance as assessed by SHapley Additive exPlanations (SHAP). Notably, the Multilayer Perceptron (MLP) model achieved the highest performance (R² = 0.8637), followed by SVR and XGBoost. However, concerns persist regarding the reliability of feature importance derived from these models and their SHAP interpretations. Specifically, predictive accuracy does not guarantee the validity of feature rankings due to the inherent biases present in tree-based, neural network, and kernel-based methods, which are further exacerbated by SHAP's inherent dependency on model outputs. To mitigate these biases, integrating robust statistical methods is crucial. Techniques such as Spearman's rho, Kendall's tau, Goodman-Kruskal's gamma, Somers' delta, and Hoeffding's dependence, combined with p-value analysis, offer unbiased assessments. Integrating these statistical methods alongside ML models ensures a more reliable evaluation of feature importance in environmental risk modeling. Consequently, future research should prioritize methodologies that combine ML with rigorous statistical validation to enhance accuracy and reduce biases.</div></div>\",\"PeriodicalId\":361,\"journal\":{\"name\":\"Journal of Hazardous Materials\",\"volume\":\"493 \",\"pages\":\"Article 138366\"},\"PeriodicalIF\":10.6000,\"publicationDate\":\"2025-08-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Hazardous Materials\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0304389425012816\",\"RegionNum\":1,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/4/22 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ENVIRONMENTAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Hazardous Materials","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0304389425012816","RegionNum":1,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/22 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"ENGINEERING, ENVIRONMENTAL","Score":null,"Total":0}
Comments on "Dialogue between algorithms and soil: Machine learning unravels the mystery of phthalates pollution in soil" by Pan et al. (2025)
Pan et al. demonstrated the superior predictive performance of their machine learning ML models for soil phthalate PAE concentrations, highlighting the critical role of feature importance as assessed by SHapley Additive exPlanations (SHAP). Notably, the Multilayer Perceptron (MLP) model achieved the highest performance (R² = 0.8637), followed by SVR and XGBoost. However, concerns persist regarding the reliability of feature importance derived from these models and their SHAP interpretations. Specifically, predictive accuracy does not guarantee the validity of feature rankings due to the inherent biases present in tree-based, neural network, and kernel-based methods, which are further exacerbated by SHAP's inherent dependency on model outputs. To mitigate these biases, integrating robust statistical methods is crucial. Techniques such as Spearman's rho, Kendall's tau, Goodman-Kruskal's gamma, Somers' delta, and Hoeffding's dependence, combined with p-value analysis, offer unbiased assessments. Integrating these statistical methods alongside ML models ensures a more reliable evaluation of feature importance in environmental risk modeling. Consequently, future research should prioritize methodologies that combine ML with rigorous statistical validation to enhance accuracy and reduce biases.
期刊介绍:
The Journal of Hazardous Materials serves as a global platform for promoting cutting-edge research in the field of Environmental Science and Engineering. Our publication features a wide range of articles, including full-length research papers, review articles, and perspectives, with the aim of enhancing our understanding of the dangers and risks associated with various materials concerning public health and the environment. It is important to note that the term "environmental contaminants" refers specifically to substances that pose hazardous effects through contamination, while excluding those that do not have such impacts on the environment or human health. Moreover, we emphasize the distinction between wastes and hazardous materials in order to provide further clarity on the scope of the journal. We have a keen interest in exploring specific compounds and microbial agents that have adverse effects on the environment.