Luke A. Yates, Zach Aandahl, Shane A. Richards, Barry W. Brook
{"title":"Cross validation for model selection: A review with examples from ecology","authors":"Luke A. Yates, Zach Aandahl, Shane A. Richards, Barry W. Brook","doi":"10.1002/ecm.1557","DOIUrl":null,"url":null,"abstract":"<p>Specifying, assessing, and selecting among candidate statistical models is fundamental to ecological research. Commonly used approaches to model selection are based on predictive scores and include information criteria such as Akaike's information criterion, and cross validation. Based on data splitting, cross validation is particularly versatile because it can be used even when it is not possible to derive a likelihood (e.g., many forms of machine learning) or count parameters precisely (e.g., mixed-effects models). However, much of the literature on cross validation is technical and spread across statistical journals, making it difficult for ecological analysts to assess and choose among the wide range of options. Here we provide a comprehensive, accessible review that explains important—but often overlooked—technical aspects of cross validation for model selection, such as: bias correction, estimation uncertainty, choice of scores, and selection rules to mitigate overfitting. We synthesize the relevant statistical advances to make recommendations for the choice of cross-validation technique and we present two ecological case studies to illustrate their application. In most instances, we recommend using exact or approximate leave-one-out cross validation to minimize bias, or otherwise <i>k</i>-fold with bias correction if <i>k</i> < 10. To mitigate overfitting when using cross validation, we recommend calibrated selection via our recently introduced modified one-standard-error rule. We advocate for the use of predictive scores in model selection across a range of typical modeling goals, such as exploration, hypothesis testing, and prediction, provided that models are specified in accordance with the stated goal. We also emphasize, as others have done, that inference on parameter estimates is biased if preceded by model selection and instead requires a carefully specified single model or further technical adjustments.</p>","PeriodicalId":11505,"journal":{"name":"Ecological Monographs","volume":"93 1","pages":""},"PeriodicalIF":7.1000,"publicationDate":"2022-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ecm.1557","citationCount":"22","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ecological Monographs","FirstCategoryId":"93","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/ecm.1557","RegionNum":1,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECOLOGY","Score":null,"Total":0}
引用次数: 22
Abstract
Specifying, assessing, and selecting among candidate statistical models is fundamental to ecological research. Commonly used approaches to model selection are based on predictive scores and include information criteria such as Akaike's information criterion, and cross validation. Based on data splitting, cross validation is particularly versatile because it can be used even when it is not possible to derive a likelihood (e.g., many forms of machine learning) or count parameters precisely (e.g., mixed-effects models). However, much of the literature on cross validation is technical and spread across statistical journals, making it difficult for ecological analysts to assess and choose among the wide range of options. Here we provide a comprehensive, accessible review that explains important—but often overlooked—technical aspects of cross validation for model selection, such as: bias correction, estimation uncertainty, choice of scores, and selection rules to mitigate overfitting. We synthesize the relevant statistical advances to make recommendations for the choice of cross-validation technique and we present two ecological case studies to illustrate their application. In most instances, we recommend using exact or approximate leave-one-out cross validation to minimize bias, or otherwise k-fold with bias correction if k < 10. To mitigate overfitting when using cross validation, we recommend calibrated selection via our recently introduced modified one-standard-error rule. We advocate for the use of predictive scores in model selection across a range of typical modeling goals, such as exploration, hypothesis testing, and prediction, provided that models are specified in accordance with the stated goal. We also emphasize, as others have done, that inference on parameter estimates is biased if preceded by model selection and instead requires a carefully specified single model or further technical adjustments.
期刊介绍:
The vision for Ecological Monographs is that it should be the place for publishing integrative, synthetic papers that elaborate new directions for the field of ecology.
Original Research Papers published in Ecological Monographs will continue to document complex observational, experimental, or theoretical studies that by their very integrated nature defy dissolution into shorter publications focused on a single topic or message.
Reviews will be comprehensive and synthetic papers that establish new benchmarks in the field, define directions for future research, contribute to fundamental understanding of ecological principles, and derive principles for ecological management in its broadest sense (including, but not limited to: conservation, mitigation, restoration, and pro-active protection of the environment). Reviews should reflect the full development of a topic and encompass relevant natural history, observational and experimental data, analyses, models, and theory. Reviews published in Ecological Monographs should further blur the boundaries between “basic” and “applied” ecology.
Concepts and Synthesis papers will conceptually advance the field of ecology. These papers are expected to go well beyond works being reviewed and include discussion of new directions, new syntheses, and resolutions of old questions.
In this world of rapid scientific advancement and never-ending environmental change, there needs to be room for the thoughtful integration of scientific ideas, data, and concepts that feeds the mind and guides the development of the maturing science of ecology. Ecological Monographs provides that room, with an expansive view to a sustainable future.