将生物多样性数据中的差距和偏差视为缺失数据问题。

IF 11 1区生物学 Q1 BIOLOGY Biological Reviews Pub Date : 2024-08-08 DOI:10.1111/brv.13127

Diana E Bowler, Robin J Boyd, Corey T Callaghan, Robert A Robinson, Nick J B Isaac, Michael J O Pocock

{"title":"将生物多样性数据中的差距和偏差视为缺失数据问题。","authors":"Diana E Bowler, Robin J Boyd, Corey T Callaghan, Robert A Robinson, Nick J B Isaac, Michael J O Pocock","doi":"10.1111/brv.13127","DOIUrl":null,"url":null,"abstract":"Big biodiversity data sets have great potential for monitoring and research because of their large taxonomic, geographic and temporal scope. Such data sets have become especially important for assessing temporal changes in species' populations and distributions. Gaps in the available data, especially spatial and temporal gaps, often mean that the data are not representative of the target population. This hinders drawing large-scale inferences, such as about species' trends, and may lead to misplaced conservation action. Here, we conceptualise gaps in biodiversity monitoring data as a missing data problem, which provides a unifying framework for the challenges and potential solutions across different types of biodiversity data sets. We characterise the typical types of data gaps as different classes of missing data and then use missing data theory to explore the implications for questions about species' trends and factors affecting occurrences/abundances. By using this framework, we show that bias due to data gaps can arise when the factors affecting sampling and/or data availability overlap with those affecting species. But a data set per se is not biased. The outcome depends on the ecological question and statistical approach, which determine choices around which sources of variation are taken into account. We argue that typical approaches to long-term species trend modelling using monitoring data are especially susceptible to data gaps since such models do not tend to account for the factors driving missingness. To identify general solutions to this problem, we review empirical studies and use simulation studies to compare some of the most frequently employed approaches to deal with data gaps, including subsampling, weighting and imputation. All these methods have the potential to reduce bias but may come at the cost of increased uncertainty of parameter estimates. Weighting techniques are arguably the least used so far in ecology and have the potential to reduce both the bias and variance of parameter estimates. Regardless of the method, the ability to reduce bias critically depends on knowledge of, and the availability of data on, the factors creating data gaps. We use this review to outline the necessary considerations when dealing with data gaps at different stages of the data collection and analysis workflow.","PeriodicalId":133,"journal":{"name":"Biological Reviews","volume":" ","pages":""},"PeriodicalIF":11.0000,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Treating gaps and biases in biodiversity data as a missing data problem.\",\"authors\":\"Diana E Bowler, Robin J Boyd, Corey T Callaghan, Robert A Robinson, Nick J B Isaac, Michael J O Pocock\",\"doi\":\"10.1111/brv.13127\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Big biodiversity data sets have great potential for monitoring and research because of their large taxonomic, geographic and temporal scope. Such data sets have become especially important for assessing temporal changes in species' populations and distributions. Gaps in the available data, especially spatial and temporal gaps, often mean that the data are not representative of the target population. This hinders drawing large-scale inferences, such as about species' trends, and may lead to misplaced conservation action. Here, we conceptualise gaps in biodiversity monitoring data as a missing data problem, which provides a unifying framework for the challenges and potential solutions across different types of biodiversity data sets. We characterise the typical types of data gaps as different classes of missing data and then use missing data theory to explore the implications for questions about species' trends and factors affecting occurrences/abundances. By using this framework, we show that bias due to data gaps can arise when the factors affecting sampling and/or data availability overlap with those affecting species. But a data set per se is not biased. The outcome depends on the ecological question and statistical approach, which determine choices around which sources of variation are taken into account. We argue that typical approaches to long-term species trend modelling using monitoring data are especially susceptible to data gaps since such models do not tend to account for the factors driving missingness. To identify general solutions to this problem, we review empirical studies and use simulation studies to compare some of the most frequently employed approaches to deal with data gaps, including subsampling, weighting and imputation. All these methods have the potential to reduce bias but may come at the cost of increased uncertainty of parameter estimates. Weighting techniques are arguably the least used so far in ecology and have the potential to reduce both the bias and variance of parameter estimates. Regardless of the method, the ability to reduce bias critically depends on knowledge of, and the availability of data on, the factors creating data gaps. We use this review to outline the necessary considerations when dealing with data gaps at different stages of the data collection and analysis workflow.\",\"PeriodicalId\":133,\"journal\":{\"name\":\"Biological Reviews\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":11.0000,\"publicationDate\":\"2024-08-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biological Reviews\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1111/brv.13127\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biological Reviews","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1111/brv.13127","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

生物多样性大数据集因其庞大的分类、地理和时间范围，在监测和研究方面具有巨大潜力。这些数据集对于评估物种种群和分布的时间变化尤为重要。可用数据的缺口，尤其是空间和时间缺口，往往意味着数据不能代表目标种群。这妨碍了大规模推断，如物种的发展趋势，并可能导致错误的保护行动。在此，我们将生物多样性监测数据的缺口概念化为数据缺失问题，为不同类型的生物多样性数据集所面临的挑战和潜在解决方案提供了一个统一的框架。我们将数据缺口的典型类型描述为不同类别的缺失数据，然后利用缺失数据理论来探讨有关物种趋势和影响出现/丰度因素的问题的影响。通过使用这一框架，我们表明，当影响采样和/或数据可用性的因素与影响物种的因素重叠时，就会出现数据缺失导致的偏差。但数据集本身并不存在偏差。结果取决于生态问题和统计方法，它们决定了对哪些变异来源进行考虑。我们认为，利用监测数据建立长期物种趋势模型的典型方法特别容易受到数据缺口的影响，因为这类模型往往不会考虑导致数据缺失的因素。为了找出这一问题的一般解决方案，我们回顾了实证研究，并利用模拟研究比较了一些最常用的处理数据缺口的方法，包括子采样、加权和估算。所有这些方法都有可能减少偏差，但可能会以增加参数估计的不确定性为代价。加权技术可以说是迄今为止生态学中使用最少的方法，但却有可能减少参数估计的偏差和方差。无论采用哪种方法，减少偏差的能力关键取决于对造成数据缺口的因素的了解和相关数据的可用性。我们通过这篇综述概述了在数据收集和分析工作流程的不同阶段处理数据缺口时的必要考虑因素。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Treating gaps and biases in biodiversity data as a missing data problem.

Big biodiversity data sets have great potential for monitoring and research because of their large taxonomic, geographic and temporal scope. Such data sets have become especially important for assessing temporal changes in species' populations and distributions. Gaps in the available data, especially spatial and temporal gaps, often mean that the data are not representative of the target population. This hinders drawing large-scale inferences, such as about species' trends, and may lead to misplaced conservation action. Here, we conceptualise gaps in biodiversity monitoring data as a missing data problem, which provides a unifying framework for the challenges and potential solutions across different types of biodiversity data sets. We characterise the typical types of data gaps as different classes of missing data and then use missing data theory to explore the implications for questions about species' trends and factors affecting occurrences/abundances. By using this framework, we show that bias due to data gaps can arise when the factors affecting sampling and/or data availability overlap with those affecting species. But a data set per se is not biased. The outcome depends on the ecological question and statistical approach, which determine choices around which sources of variation are taken into account. We argue that typical approaches to long-term species trend modelling using monitoring data are especially susceptible to data gaps since such models do not tend to account for the factors driving missingness. To identify general solutions to this problem, we review empirical studies and use simulation studies to compare some of the most frequently employed approaches to deal with data gaps, including subsampling, weighting and imputation. All these methods have the potential to reduce bias but may come at the cost of increased uncertainty of parameter estimates. Weighting techniques are arguably the least used so far in ecology and have the potential to reduce both the bias and variance of parameter estimates. Regardless of the method, the ability to reduce bias critically depends on knowledge of, and the availability of data on, the factors creating data gaps. We use this review to outline the necessary considerations when dealing with data gaps at different stages of the data collection and analysis workflow.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Biological Reviews 生物-生物学

CiteScore

21.30

自引率

2.00%

发文量

审稿时长

6-12 weeks

期刊介绍： Biological Reviews is a scientific journal that covers a wide range of topics in the biological sciences. It publishes several review articles per issue, which are aimed at both non-specialist biologists and researchers in the field. The articles are scholarly and include extensive bibliographies. Authors are instructed to be aware of the diverse readership and write their articles accordingly. The reviews in Biological Reviews serve as comprehensive introductions to specific fields, presenting the current state of the art and highlighting gaps in knowledge. Each article can be up to 20,000 words long and includes an abstract, a thorough introduction, and a statement of conclusions. The journal focuses on publishing synthetic reviews, which are based on existing literature and address important biological questions. These reviews are interesting to a broad readership and are timely, often related to fast-moving fields or new discoveries. A key aspect of a synthetic review is that it goes beyond simply compiling information and instead analyzes the collected data to create a new theoretical or conceptual framework that can significantly impact the field. Biological Reviews is abstracted and indexed in various databases, including Abstracts on Hygiene & Communicable Diseases, Academic Search, AgBiotech News & Information, AgBiotechNet, AGRICOLA Database, GeoRef, Global Health, SCOPUS, Weed Abstracts, and Reaction Citation Index, among others.