{"title":"Jump around: Selecting Markov Chain Monte Carlo parameters and diagnostics for improved food web model quality and ecosystem representation","authors":"Gemma Gerber , Ursula M. Scharler","doi":"10.1016/j.ecoinf.2024.102865","DOIUrl":null,"url":null,"abstract":"<div><div>Capturing ecological data variability in food web models is an important step for improving model representation of empirical systems. One approach is to use linear inverse modelling and Markov Chain Monte Carlo (LIM-MCMC) techniques to set up an inverse LIM problem using empirical data constraints, and then sample multiple plausible food webs from the inverse problem using an MCMC algorithm. We describe the set of plausible food webs as an ‘ensemble’ of solutions to the inverse problem sampled with the LIM-MCMC algorithm. The extent of data variability eventually integrated into an ensemble depends on how well the LIM-MCMC algorithm samples the solution space. Algorithm quality can be adjusted via user-defined parameters describing starting points, jump sizes, and number of iterations or food webs produced. However, little information exists on how each LIM-MCMC algorithm parameter affects the degree of empirical data variability introduced into the ensemble. Further, post hoc algorithm quality diagnostics with commonly used trace plots and the coefficient of variation (CoV) rarely address critical aspects of algorithm quality, such as (1) if the returned ensemble successfully targeted the solution space distribution (stationarity), (2) correlation between ensemble solutions (mixing), and (3) if the ensemble contains enough solutions to adequately capture input data variability (sampling efficiency). Therefore, we used several established MCMC convergence diagnostics to (1) quantify how algorithm parameters affect ensemble flow values and if these differences propagate to ecological indicators and (2) evaluate algorithm quality and compare to current evaluation and ecosystem modelling methods. We applied 30 LIM-MCMC algorithm combinations of varying starting points, jump sizes, and number of iterations to solve food web ensembles from a single food web model. We analysed ensembles with Ecological Network Analysis (ENA) to calculate indicators describing system function. Results show that LIM-MCMC algorithm parameters, in particular the jump size, affect ensemble flow values, which propagate to ecological indicators describing different ecosystem function of the same model. Thereafter, comparisons of post hoc diagnostics show that MCMC convergence diagnostics provided more robust estimates of algorithm quality than trace plots and CoV. Together, these findings underpin several novel recommendations to enhance LIM-MCMC algorithm parameter selection and quality assessments applicable to any ecological ensemble network study.</div></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":"84 ","pages":"Article 102865"},"PeriodicalIF":5.8000,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ecological Informatics","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1574954124004072","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Capturing ecological data variability in food web models is an important step for improving model representation of empirical systems. One approach is to use linear inverse modelling and Markov Chain Monte Carlo (LIM-MCMC) techniques to set up an inverse LIM problem using empirical data constraints, and then sample multiple plausible food webs from the inverse problem using an MCMC algorithm. We describe the set of plausible food webs as an ‘ensemble’ of solutions to the inverse problem sampled with the LIM-MCMC algorithm. The extent of data variability eventually integrated into an ensemble depends on how well the LIM-MCMC algorithm samples the solution space. Algorithm quality can be adjusted via user-defined parameters describing starting points, jump sizes, and number of iterations or food webs produced. However, little information exists on how each LIM-MCMC algorithm parameter affects the degree of empirical data variability introduced into the ensemble. Further, post hoc algorithm quality diagnostics with commonly used trace plots and the coefficient of variation (CoV) rarely address critical aspects of algorithm quality, such as (1) if the returned ensemble successfully targeted the solution space distribution (stationarity), (2) correlation between ensemble solutions (mixing), and (3) if the ensemble contains enough solutions to adequately capture input data variability (sampling efficiency). Therefore, we used several established MCMC convergence diagnostics to (1) quantify how algorithm parameters affect ensemble flow values and if these differences propagate to ecological indicators and (2) evaluate algorithm quality and compare to current evaluation and ecosystem modelling methods. We applied 30 LIM-MCMC algorithm combinations of varying starting points, jump sizes, and number of iterations to solve food web ensembles from a single food web model. We analysed ensembles with Ecological Network Analysis (ENA) to calculate indicators describing system function. Results show that LIM-MCMC algorithm parameters, in particular the jump size, affect ensemble flow values, which propagate to ecological indicators describing different ecosystem function of the same model. Thereafter, comparisons of post hoc diagnostics show that MCMC convergence diagnostics provided more robust estimates of algorithm quality than trace plots and CoV. Together, these findings underpin several novel recommendations to enhance LIM-MCMC algorithm parameter selection and quality assessments applicable to any ecological ensemble network study.
期刊介绍:
The journal Ecological Informatics is devoted to the publication of high quality, peer-reviewed articles on all aspects of computational ecology, data science and biogeography. The scope of the journal takes into account the data-intensive nature of ecology, the growing capacity of information technology to access, harness and leverage complex data as well as the critical need for informing sustainable management in view of global environmental and climate change.
The nature of the journal is interdisciplinary at the crossover between ecology and informatics. It focuses on novel concepts and techniques for image- and genome-based monitoring and interpretation, sensor- and multimedia-based data acquisition, internet-based data archiving and sharing, data assimilation, modelling and prediction of ecological data.