Lucas Damián Gorné, Jesús Aguirre-Gutiérrez, Fernanda C. Souza, Nathan G. Swenson, Nathan Jared Boardman Kraft, Beatriz Schwantes Marimon, Timothy R. Baker, Renato A. Ferreira de Lima, Emilio Vilanova, Esteban Álvarez-Dávila, Abel Monteagudo Mendoza, Gerardo Rafael Flores Llampazo, Rubens Manoel dos Santos, Gerhard Boenisch, Alejandro Araujo-Murakami, Gonzalo Rivas-Torres, Hirma Ramírez-Angulo, Nayane Cristina dos Santos Prestes, Paulo S. Morandi, Sabina Cerruto Ribeiro, Wesley Jonatar A. da Cruz, Mathias Disney, Anthony Di Fiore, Ben Hur Marimon-Junior, Ted R. Feldpausch, Yadvinder Malhi, Oliver L. Phillips, David Galbraith, Sandra Díaz
{"title":"生态学中性状估算的使用与误用:使用断章取义的估算值问题","authors":"Lucas Damián Gorné, Jesús Aguirre-Gutiérrez, Fernanda C. Souza, Nathan G. Swenson, Nathan Jared Boardman Kraft, Beatriz Schwantes Marimon, Timothy R. Baker, Renato A. Ferreira de Lima, Emilio Vilanova, Esteban Álvarez-Dávila, Abel Monteagudo Mendoza, Gerardo Rafael Flores Llampazo, Rubens Manoel dos Santos, Gerhard Boenisch, Alejandro Araujo-Murakami, Gonzalo Rivas-Torres, Hirma Ramírez-Angulo, Nayane Cristina dos Santos Prestes, Paulo S. Morandi, Sabina Cerruto Ribeiro, Wesley Jonatar A. da Cruz, Mathias Disney, Anthony Di Fiore, Ben Hur Marimon-Junior, Ted R. Feldpausch, Yadvinder Malhi, Oliver L. Phillips, David Galbraith, Sandra Díaz","doi":"10.1111/ecog.07520","DOIUrl":null,"url":null,"abstract":"Despite the progress in the measurement and accessibility of plant trait information, acquiring sufficiently complete data from enough species to answer broad‐scale questions in plant functional ecology and biogeography remains challenging. A common way to overcome this challenge is by imputation, or ‘gap‐filling' of trait values. This has proven appropriate when focusing on the overall patterns emerging from the database being imputed. However, some applications force the imputation procedure out of its original scope, using imputed values independently from the imputation context, and specific trait values for a given species are used as input for computing new variables. We tested the performance of three widely used imputation methods (Bayesian hierarchical probabilistic matrix factorization, multiple imputation by chained equations with predictive mean matching, and Rphylopars) on a database of tropical tree and shrub traits. By applying a leave‐one‐out procedure, we assessed the accuracy and precision of the imputed values and found that out‐of‐context use of imputed values may bias the estimation of different variables. We also found that low redundancy (i.e. low predictability of a new value on the basis of existing values) in the dataset, not uncommon for empirical datasets, is likely the main cause of low accuracy and precision in the imputed values. We therefore suggest the use of a leave‐one‐out procedure to test the quality of the imputed values before any out‐of‐context application of the imputed values, and make practical recommendations to avoid the misuse of imputation procedures. Furthermore, we recommend not publishing gap‐filled datasets, publishing instead only the empirical data, together with the imputation method applied and the corresponding script to reproduce the imputation. This will help avoid the spread of imputed data, whose accuracy, precision, and source are difficult to assess and track, into the public domain.","PeriodicalId":51026,"journal":{"name":"Ecography","volume":"4 1","pages":""},"PeriodicalIF":5.4000,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Use and misuse of trait imputation in ecology: the problem of using out‐of‐context imputed values\",\"authors\":\"Lucas Damián Gorné, Jesús Aguirre-Gutiérrez, Fernanda C. Souza, Nathan G. Swenson, Nathan Jared Boardman Kraft, Beatriz Schwantes Marimon, Timothy R. Baker, Renato A. Ferreira de Lima, Emilio Vilanova, Esteban Álvarez-Dávila, Abel Monteagudo Mendoza, Gerardo Rafael Flores Llampazo, Rubens Manoel dos Santos, Gerhard Boenisch, Alejandro Araujo-Murakami, Gonzalo Rivas-Torres, Hirma Ramírez-Angulo, Nayane Cristina dos Santos Prestes, Paulo S. Morandi, Sabina Cerruto Ribeiro, Wesley Jonatar A. da Cruz, Mathias Disney, Anthony Di Fiore, Ben Hur Marimon-Junior, Ted R. Feldpausch, Yadvinder Malhi, Oliver L. Phillips, David Galbraith, Sandra Díaz\",\"doi\":\"10.1111/ecog.07520\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Despite the progress in the measurement and accessibility of plant trait information, acquiring sufficiently complete data from enough species to answer broad‐scale questions in plant functional ecology and biogeography remains challenging. A common way to overcome this challenge is by imputation, or ‘gap‐filling' of trait values. This has proven appropriate when focusing on the overall patterns emerging from the database being imputed. However, some applications force the imputation procedure out of its original scope, using imputed values independently from the imputation context, and specific trait values for a given species are used as input for computing new variables. We tested the performance of three widely used imputation methods (Bayesian hierarchical probabilistic matrix factorization, multiple imputation by chained equations with predictive mean matching, and Rphylopars) on a database of tropical tree and shrub traits. By applying a leave‐one‐out procedure, we assessed the accuracy and precision of the imputed values and found that out‐of‐context use of imputed values may bias the estimation of different variables. We also found that low redundancy (i.e. low predictability of a new value on the basis of existing values) in the dataset, not uncommon for empirical datasets, is likely the main cause of low accuracy and precision in the imputed values. We therefore suggest the use of a leave‐one‐out procedure to test the quality of the imputed values before any out‐of‐context application of the imputed values, and make practical recommendations to avoid the misuse of imputation procedures. Furthermore, we recommend not publishing gap‐filled datasets, publishing instead only the empirical data, together with the imputation method applied and the corresponding script to reproduce the imputation. This will help avoid the spread of imputed data, whose accuracy, precision, and source are difficult to assess and track, into the public domain.\",\"PeriodicalId\":51026,\"journal\":{\"name\":\"Ecography\",\"volume\":\"4 1\",\"pages\":\"\"},\"PeriodicalIF\":5.4000,\"publicationDate\":\"2025-02-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Ecography\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://doi.org/10.1111/ecog.07520\",\"RegionNum\":1,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIODIVERSITY CONSERVATION\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ecography","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.1111/ecog.07520","RegionNum":1,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIODIVERSITY CONSERVATION","Score":null,"Total":0}
Use and misuse of trait imputation in ecology: the problem of using out‐of‐context imputed values
Despite the progress in the measurement and accessibility of plant trait information, acquiring sufficiently complete data from enough species to answer broad‐scale questions in plant functional ecology and biogeography remains challenging. A common way to overcome this challenge is by imputation, or ‘gap‐filling' of trait values. This has proven appropriate when focusing on the overall patterns emerging from the database being imputed. However, some applications force the imputation procedure out of its original scope, using imputed values independently from the imputation context, and specific trait values for a given species are used as input for computing new variables. We tested the performance of three widely used imputation methods (Bayesian hierarchical probabilistic matrix factorization, multiple imputation by chained equations with predictive mean matching, and Rphylopars) on a database of tropical tree and shrub traits. By applying a leave‐one‐out procedure, we assessed the accuracy and precision of the imputed values and found that out‐of‐context use of imputed values may bias the estimation of different variables. We also found that low redundancy (i.e. low predictability of a new value on the basis of existing values) in the dataset, not uncommon for empirical datasets, is likely the main cause of low accuracy and precision in the imputed values. We therefore suggest the use of a leave‐one‐out procedure to test the quality of the imputed values before any out‐of‐context application of the imputed values, and make practical recommendations to avoid the misuse of imputation procedures. Furthermore, we recommend not publishing gap‐filled datasets, publishing instead only the empirical data, together with the imputation method applied and the corresponding script to reproduce the imputation. This will help avoid the spread of imputed data, whose accuracy, precision, and source are difficult to assess and track, into the public domain.
期刊介绍:
ECOGRAPHY publishes exciting, novel, and important articles that significantly advance understanding of ecological or biodiversity patterns in space or time. Papers focusing on conservation or restoration are welcomed, provided they are anchored in ecological theory and convey a general message that goes beyond a single case study. We encourage papers that seek advancing the field through the development and testing of theory or methodology, or by proposing new tools for analysis or interpretation of ecological phenomena. Manuscripts are expected to address general principles in ecology, though they may do so using a specific model system if they adequately frame the problem relative to a generalized ecological question or problem.
Purely descriptive papers are considered only if breaking new ground and/or describing patterns seldom explored. Studies focused on a single species or single location are generally discouraged unless they make a significant contribution to advancing general theory or understanding of biodiversity patterns and processes. Manuscripts merely confirming or marginally extending results of previous work are unlikely to be considered in Ecography.
Papers are judged by virtue of their originality, appeal to general interest, and their contribution to new developments in studies of spatial and temporal ecological patterns. There are no biases with regard to taxon, biome, or biogeographical area.