Context
The random forest model (RF) has been widely applied for crop yield prediction. However, extrapolation, measurement errors, and uncertainty arising from limited predictive power of covariates may affect the model performance.
Objective
This study aimed to interpret and assess the accuracy of RF for potato yield prediction in China and quantify the main sources of uncertainty using the C.T. de Wit’s three-quadrant diagram.
Methods
A dataset including 2182 plot-year combinations was derived from 63 potato field experiments covering nine Chinese provinces and three years. Model performance was evaluated by 10-fold cross-validation (CV), leave-block-out (LBOCV), leave-site-out (LSOCV), and leave-year-out cross-validation (LYOCV).
Results
The root mean square error (RMSE) was 3.5, 8.3, 9.9 and 10.3 t ha−1, while the model efficiency coefficient (MEC) was 0.92, 0.64, 0.52 and 0.43 for 10-fold CV, LBOCV, LSOCV and LYOCV, respectively. Cumulated sunshine duration and topography position index were the most important covariates, while fertiliser variables were identified as least important for yield modelling. The standard deviation of the yield replicate variability estimated by a linear model accounted for 32 % of the RMSE for LSOCV. Introducing measured uptake of nutrient omission treatments, uptake of all treatments, and yields of nutrient omission treatments as additional covariates decreased the LSOCV RMSE by 2.3 t ha−1 on average.
Conclusions
The fitted models could explain up to 92 % of potato yield variability in China, although there was a considerable residual error when extrapolating to other areas or years. Yield replicate variability accounted for one-third of the residual error. Information about physiological efficiency was the main source of uncertainty, followed by available soil nutrients. Fertiliser recovery was least important because most of the experiments were conducted in fertile fields.
Implications
Combining a RF model with the three-quadrant diagram allows to better explain yield prediction uncertainty. The methodology used in this study can be applied to other crops, countries and data-driven models.