{"title":"Preprocessing and regression approaches alter the spectral estimation accuracy of plant phosphorus content—A three-level meta-analysis","authors":"Tianli Wang , Yi Zhang , Fei Li , Ning Cao","doi":"10.1016/j.compag.2025.110205","DOIUrl":null,"url":null,"abstract":"<div><div>Remote sensing technology and machine learning methods are being scaled up globally to predict nutrient content based on spectral data. However, there is a lack of rigorous comparison of co-benefit delivery across different factors, which leads to unstable accuracy of the final model owing to insufficient analysis of the factors influencing the prediction model. In particular, for nutrients (e.g. phosphorus), visual symptoms are not obvious or have a certain lag. Therefore, a Three-Level Meta-Analysis model was proposed in this study to extract and analyse a large number of studies, delving into the analysis of various influencing factors and filling the current knowledge gap. Through global synthesis, a Three-Level Meta-Analysis was applied to seven validated datasets of field observations from multispectral remote sensing, including 32 effect sizes, and 46 datasets of field observations from hyperspectral remote sensing, including 630 effect sizes. We thoroughly explored the heterogeneity of a Three-Level Meta-Analysis using the new machine learning method Meta-Forest, while also using Meta-Cart to explore the interaction effects between moderating variables. Through a comprehensive analysis of the literature published over the past 25 years, we determined the importance of matching preprocessing and regression methods for predicting plant phosphorus spectral responses. The combination of pretreatment and regression methods is particularly important for regional-scale phosphorus concentration prediction. Baseline calibration is effective in removing background noise at the regional scale; however, it cannot solve the problem of redundancy between hyperspectral data. It is necessary to combine a regression method that can effectively deal with redundancy between data to improve the accuracy of the model. Nonlinear non-parametric regression can better deal with the complex nonlinear relationship between phosphorus concentration and spectral data and can resist the influence of the quantity and quality of the data itself and the heterogeneity of the study area; therefore, it has excellent prediction ability. The type of spectrometer is crucial for predicting regional phosphorus concentrations using multispectral data, especially when collecting data using drones. This study provides guidance for fully utilising spectral data and establishing a fast, efficient, and non-destructive prediction model for plant P concentrations, revealing the optimal selection of data preprocessing and regression methods.</div></div>","PeriodicalId":50627,"journal":{"name":"Computers and Electronics in Agriculture","volume":"234 ","pages":"Article 110205"},"PeriodicalIF":7.7000,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers and Electronics in Agriculture","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0168169925003114","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Remote sensing technology and machine learning methods are being scaled up globally to predict nutrient content based on spectral data. However, there is a lack of rigorous comparison of co-benefit delivery across different factors, which leads to unstable accuracy of the final model owing to insufficient analysis of the factors influencing the prediction model. In particular, for nutrients (e.g. phosphorus), visual symptoms are not obvious or have a certain lag. Therefore, a Three-Level Meta-Analysis model was proposed in this study to extract and analyse a large number of studies, delving into the analysis of various influencing factors and filling the current knowledge gap. Through global synthesis, a Three-Level Meta-Analysis was applied to seven validated datasets of field observations from multispectral remote sensing, including 32 effect sizes, and 46 datasets of field observations from hyperspectral remote sensing, including 630 effect sizes. We thoroughly explored the heterogeneity of a Three-Level Meta-Analysis using the new machine learning method Meta-Forest, while also using Meta-Cart to explore the interaction effects between moderating variables. Through a comprehensive analysis of the literature published over the past 25 years, we determined the importance of matching preprocessing and regression methods for predicting plant phosphorus spectral responses. The combination of pretreatment and regression methods is particularly important for regional-scale phosphorus concentration prediction. Baseline calibration is effective in removing background noise at the regional scale; however, it cannot solve the problem of redundancy between hyperspectral data. It is necessary to combine a regression method that can effectively deal with redundancy between data to improve the accuracy of the model. Nonlinear non-parametric regression can better deal with the complex nonlinear relationship between phosphorus concentration and spectral data and can resist the influence of the quantity and quality of the data itself and the heterogeneity of the study area; therefore, it has excellent prediction ability. The type of spectrometer is crucial for predicting regional phosphorus concentrations using multispectral data, especially when collecting data using drones. This study provides guidance for fully utilising spectral data and establishing a fast, efficient, and non-destructive prediction model for plant P concentrations, revealing the optimal selection of data preprocessing and regression methods.
期刊介绍:
Computers and Electronics in Agriculture provides international coverage of advancements in computer hardware, software, electronic instrumentation, and control systems applied to agricultural challenges. Encompassing agronomy, horticulture, forestry, aquaculture, and animal farming, the journal publishes original papers, reviews, and applications notes. It explores the use of computers and electronics in plant or animal agricultural production, covering topics like agricultural soils, water, pests, controlled environments, and waste. The scope extends to on-farm post-harvest operations and relevant technologies, including artificial intelligence, sensors, machine vision, robotics, networking, and simulation modeling. Its companion journal, Smart Agricultural Technology, continues the focus on smart applications in production agriculture.