Leveraging visible-near-infrared spectroscopy and machine learning to detect nickel contamination in soil: Addressing class imbalances for environmental management
Chongchong Qi , Kechao Li , Min Zhou , Chunhui Zhang , Xiaoming Zheng , Qiusong Chen , Tao Hu
{"title":"Leveraging visible-near-infrared spectroscopy and machine learning to detect nickel contamination in soil: Addressing class imbalances for environmental management","authors":"Chongchong Qi , Kechao Li , Min Zhou , Chunhui Zhang , Xiaoming Zheng , Qiusong Chen , Tao Hu","doi":"10.1016/j.hazadv.2024.100489","DOIUrl":null,"url":null,"abstract":"<div><div>Excessive concentrations of Ni in soil have many severe effects, negatively affecting human health and leading to disease, while also posing a threat to animals and plants. Although the dangers of high Ni concentrations have been widely recognized, rapid and large-scale tools for the identification of Ni contamination are still lacking. Visible-near-infrared (Vis-NIR) spectroscopy has been employed to rapidly identify Ni contamination; however, previous studies suffer from issues inherent to small datasets and the tendency to negate data imbalances. To address these issues, a large dataset comprising 18,675 soil samples was used to predict soil Ni contamination by combining Vis-NIR data with machine learning (ML). The data imbalance inherent to previous studies was addressed using two data sampling methods. To build a robust classification model for Ni contamination, four spectral preprocessing methods and four ML algorithms were compared. The optimal extreme gradient boosting model achieved recall, accuracy, area under the curve, and geometric mean scores of 0.8203, 0.8806, 0.9268, and 0.8508, respectively. Model predictions across the United States identified specific regions with high possibility of Ni contamination. Overall, the model developed in this study offers an improved accuracy in predicting soil Ni contamination at the continental scale, and can be used to prioritize further testing and guide policymaking.</div></div>","PeriodicalId":73763,"journal":{"name":"Journal of hazardous materials advances","volume":null,"pages":null},"PeriodicalIF":5.4000,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of hazardous materials advances","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772416624000901","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ENVIRONMENTAL","Score":null,"Total":0}
引用次数: 0
Abstract
Excessive concentrations of Ni in soil have many severe effects, negatively affecting human health and leading to disease, while also posing a threat to animals and plants. Although the dangers of high Ni concentrations have been widely recognized, rapid and large-scale tools for the identification of Ni contamination are still lacking. Visible-near-infrared (Vis-NIR) spectroscopy has been employed to rapidly identify Ni contamination; however, previous studies suffer from issues inherent to small datasets and the tendency to negate data imbalances. To address these issues, a large dataset comprising 18,675 soil samples was used to predict soil Ni contamination by combining Vis-NIR data with machine learning (ML). The data imbalance inherent to previous studies was addressed using two data sampling methods. To build a robust classification model for Ni contamination, four spectral preprocessing methods and four ML algorithms were compared. The optimal extreme gradient boosting model achieved recall, accuracy, area under the curve, and geometric mean scores of 0.8203, 0.8806, 0.9268, and 0.8508, respectively. Model predictions across the United States identified specific regions with high possibility of Ni contamination. Overall, the model developed in this study offers an improved accuracy in predicting soil Ni contamination at the continental scale, and can be used to prioritize further testing and guide policymaking.