Mehmet Taşan , Yusuf Demir , Sevda Taşan , Elif Öztürk
{"title":"Comparative analysis of different machine learning algorithms for predicting trace metal concentrations in soils under intensive paddy cultivation","authors":"Mehmet Taşan , Yusuf Demir , Sevda Taşan , Elif Öztürk","doi":"10.1016/j.compag.2024.108772","DOIUrl":null,"url":null,"abstract":"<div><p>Contamination of agricultural soils with trace metals is of concern as it poses potential long-term threats to water resources, aquatic species, and human health. Therefore, fast, accurate and reliable methods should be developed to monitor trace metal content of agricultural soils. This study was conducted to compare performance of different machine learning models (Artificial Neural Network – ANN, Deep Neural Network - DNN, Random Forest - RF, K-Nearest Neighbors - KNN and Adaptive Boosting - AB) in estimation of heavy metal (Cu, Fe, Mn, and Zn) contents of the soils over which intensive paddy-farming has been practiced for years. Model stability was also investigated. Based on correlation analysis, some soil physicochemical parameters (EC, pH, Na, K, N) and soil depth were defined as covariates to improve estimation accuracy for soil heavy metals. Model performance was assessed through coefficient of determination (R<sup>2</sup>), mean absolute error (MAE), and root mean square error (RMSE). Scatter plots, box plots and Taylor diagrams were used for graphical comparison of model performances. Present findings revealed that with greater R<sup>2</sup> and lower RMSE values, RF model (RMSE = 1.11 ppm, R<sup>2</sup> = 0.90) yielded more accurate outcomes for Cu, RF (RMSE = 25.40 ppm, R<sup>2</sup> = 0.67) model for Fe, RF (RMSE = 9.05 ppm, R<sup>2</sup> = 0.59) model for Mn and ANN (RMSE = 0.35 ppm, R<sup>2</sup> = 0.49) model for Zn than the other models. Besides, AB model yielded more stable estimations for Cu contents and ANN models for the other heavy metals. The smallest change in RMSE values of training and testing datasets was 2.5 % (AB) for Cu, 10.38 % (ANN) for Fe, 21.35 % (ANN) for Mn and 6.79 % (ANN) for Zn. Besides, overfitting was observed in RF model. Moreover, the sensitivity analysis of the best and most stable models showed that EC, pH, and N in particular had a significant impact on the Zn, Cu, Mn, and Fe accumulation of soils. Better performance of ANN models was resulted from better modeling of complex nonlinear relationships between heavy metal contents of soils and covariates. It was concluded based on present findings that artificial intelligence-based methods could reliably and successfully be use to predict trace metal content of paddy fields.</p></div>","PeriodicalId":50627,"journal":{"name":"Computers and Electronics in Agriculture","volume":"219 ","pages":"Article 108772"},"PeriodicalIF":7.7000,"publicationDate":"2024-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers and Electronics in Agriculture","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0168169924001637","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Contamination of agricultural soils with trace metals is of concern as it poses potential long-term threats to water resources, aquatic species, and human health. Therefore, fast, accurate and reliable methods should be developed to monitor trace metal content of agricultural soils. This study was conducted to compare performance of different machine learning models (Artificial Neural Network – ANN, Deep Neural Network - DNN, Random Forest - RF, K-Nearest Neighbors - KNN and Adaptive Boosting - AB) in estimation of heavy metal (Cu, Fe, Mn, and Zn) contents of the soils over which intensive paddy-farming has been practiced for years. Model stability was also investigated. Based on correlation analysis, some soil physicochemical parameters (EC, pH, Na, K, N) and soil depth were defined as covariates to improve estimation accuracy for soil heavy metals. Model performance was assessed through coefficient of determination (R2), mean absolute error (MAE), and root mean square error (RMSE). Scatter plots, box plots and Taylor diagrams were used for graphical comparison of model performances. Present findings revealed that with greater R2 and lower RMSE values, RF model (RMSE = 1.11 ppm, R2 = 0.90) yielded more accurate outcomes for Cu, RF (RMSE = 25.40 ppm, R2 = 0.67) model for Fe, RF (RMSE = 9.05 ppm, R2 = 0.59) model for Mn and ANN (RMSE = 0.35 ppm, R2 = 0.49) model for Zn than the other models. Besides, AB model yielded more stable estimations for Cu contents and ANN models for the other heavy metals. The smallest change in RMSE values of training and testing datasets was 2.5 % (AB) for Cu, 10.38 % (ANN) for Fe, 21.35 % (ANN) for Mn and 6.79 % (ANN) for Zn. Besides, overfitting was observed in RF model. Moreover, the sensitivity analysis of the best and most stable models showed that EC, pH, and N in particular had a significant impact on the Zn, Cu, Mn, and Fe accumulation of soils. Better performance of ANN models was resulted from better modeling of complex nonlinear relationships between heavy metal contents of soils and covariates. It was concluded based on present findings that artificial intelligence-based methods could reliably and successfully be use to predict trace metal content of paddy fields.
期刊介绍:
Computers and Electronics in Agriculture provides international coverage of advancements in computer hardware, software, electronic instrumentation, and control systems applied to agricultural challenges. Encompassing agronomy, horticulture, forestry, aquaculture, and animal farming, the journal publishes original papers, reviews, and applications notes. It explores the use of computers and electronics in plant or animal agricultural production, covering topics like agricultural soils, water, pests, controlled environments, and waste. The scope extends to on-farm post-harvest operations and relevant technologies, including artificial intelligence, sensors, machine vision, robotics, networking, and simulation modeling. Its companion journal, Smart Agricultural Technology, continues the focus on smart applications in production agriculture.