Michela Leonardi, Margherita Colucci, Andrea Vittorio Pozzi, Eleanor M. L. Scerri, Andrea Manica
{"title":"tidysdm: Leveraging the flexibility of tidymodels for species distribution modelling in R","authors":"Michela Leonardi, Margherita Colucci, Andrea Vittorio Pozzi, Eleanor M. L. Scerri, Andrea Manica","doi":"10.1111/2041-210x.14406","DOIUrl":null,"url":null,"abstract":"<jats:list> <jats:list-item>In species distribution modelling (SDM), it is common practice to explore multiple machine learning (ML) algorithms and combine their results into ensembles. In R, many implementations of different ML algorithms are available but, as they were mostly developed independently, they often use inconsistent syntax and data structures. For this reason, repeating an analysis with multiple algorithms and combining their results can be challenging.</jats:list-item> <jats:list-item>Specialised SDM packages solve this problem by providing a simpler, unified interface by wrapping the original functions to tackle each specific requirement. However, creating and maintaining such interfaces is time‐consuming, and with this approach, the user cannot easily integrate other methods that may become available.</jats:list-item> <jats:list-item>Here, we present <jats:italic>tidysdm</jats:italic>, an R package that solves this problem by taking advantage of the <jats:italic>tidymodels</jats:italic> universe. <jats:italic>tidymodels</jats:italic> provide standardised grammar, data structures and modelling interfaces, and a well‐documented infrastructure to integrate new algorithms and metrics. The wide adoption of <jats:italic>tidymodels</jats:italic> means that most ML algorithms and metrics are already integrated, and the user can add additional ones. Moreover, because of the broad adoption of <jats:italic>tidymodels</jats:italic>, new statistical approaches tend to be implemented quickly, making them easily integrated into existing pipelines and analyses.</jats:list-item> <jats:list-item><jats:italic>tidysdm</jats:italic> takes advantage of the <jats:italic>tidymodels</jats:italic> universe to provide a flexible and fully customisable pipeline to fit SDM. It includes SDM‐specific algorithms and metrics, and methods to facilitate the use of spatial data within <jats:italic>tidymodels</jats:italic>.</jats:list-item> <jats:list-item>Additionally, <jats:italic>tidysdm</jats:italic> is the first software that natively allows SDM to be performed using data from different periods, expanding the availability of SDM for scholars working in palaeontology, archaeology, palaeobiology, palaeoecology and other disciplines focussing on the past.</jats:list-item> </jats:list>","PeriodicalId":208,"journal":{"name":"Methods in Ecology and Evolution","volume":null,"pages":null},"PeriodicalIF":6.3000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Methods in Ecology and Evolution","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.1111/2041-210x.14406","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
In species distribution modelling (SDM), it is common practice to explore multiple machine learning (ML) algorithms and combine their results into ensembles. In R, many implementations of different ML algorithms are available but, as they were mostly developed independently, they often use inconsistent syntax and data structures. For this reason, repeating an analysis with multiple algorithms and combining their results can be challenging.Specialised SDM packages solve this problem by providing a simpler, unified interface by wrapping the original functions to tackle each specific requirement. However, creating and maintaining such interfaces is time‐consuming, and with this approach, the user cannot easily integrate other methods that may become available.Here, we present tidysdm, an R package that solves this problem by taking advantage of the tidymodels universe. tidymodels provide standardised grammar, data structures and modelling interfaces, and a well‐documented infrastructure to integrate new algorithms and metrics. The wide adoption of tidymodels means that most ML algorithms and metrics are already integrated, and the user can add additional ones. Moreover, because of the broad adoption of tidymodels, new statistical approaches tend to be implemented quickly, making them easily integrated into existing pipelines and analyses.tidysdm takes advantage of the tidymodels universe to provide a flexible and fully customisable pipeline to fit SDM. It includes SDM‐specific algorithms and metrics, and methods to facilitate the use of spatial data within tidymodels.Additionally, tidysdm is the first software that natively allows SDM to be performed using data from different periods, expanding the availability of SDM for scholars working in palaeontology, archaeology, palaeobiology, palaeoecology and other disciplines focussing on the past.
期刊介绍:
A British Ecological Society journal, Methods in Ecology and Evolution (MEE) promotes the development of new methods in ecology and evolution, and facilitates their dissemination and uptake by the research community. MEE brings together papers from previously disparate sub-disciplines to provide a single forum for tracking methodological developments in all areas.
MEE publishes methodological papers in any area of ecology and evolution, including:
-Phylogenetic analysis
-Statistical methods
-Conservation & management
-Theoretical methods
-Practical methods, including lab and field
-This list is not exhaustive, and we welcome enquiries about possible submissions. Methods are defined in the widest terms and may be analytical, practical or conceptual.
A primary aim of the journal is to maximise the uptake of techniques by the community. We recognise that a major stumbling block in the uptake and application of new methods is the accessibility of methods. For example, users may need computer code, example applications or demonstrations of methods.