Treating gaps and biases in biodiversity data as a missing data problem.

IF 11 1区 生物学 Q1 BIOLOGY Biological Reviews Pub Date : 2024-08-08 DOI:10.1111/brv.13127
Diana E Bowler, Robin J Boyd, Corey T Callaghan, Robert A Robinson, Nick J B Isaac, Michael J O Pocock
{"title":"Treating gaps and biases in biodiversity data as a missing data problem.","authors":"Diana E Bowler, Robin J Boyd, Corey T Callaghan, Robert A Robinson, Nick J B Isaac, Michael J O Pocock","doi":"10.1111/brv.13127","DOIUrl":null,"url":null,"abstract":"<p><p>Big biodiversity data sets have great potential for monitoring and research because of their large taxonomic, geographic and temporal scope. Such data sets have become especially important for assessing temporal changes in species' populations and distributions. Gaps in the available data, especially spatial and temporal gaps, often mean that the data are not representative of the target population. This hinders drawing large-scale inferences, such as about species' trends, and may lead to misplaced conservation action. Here, we conceptualise gaps in biodiversity monitoring data as a missing data problem, which provides a unifying framework for the challenges and potential solutions across different types of biodiversity data sets. We characterise the typical types of data gaps as different classes of missing data and then use missing data theory to explore the implications for questions about species' trends and factors affecting occurrences/abundances. By using this framework, we show that bias due to data gaps can arise when the factors affecting sampling and/or data availability overlap with those affecting species. But a data set per se is not biased. The outcome depends on the ecological question and statistical approach, which determine choices around which sources of variation are taken into account. We argue that typical approaches to long-term species trend modelling using monitoring data are especially susceptible to data gaps since such models do not tend to account for the factors driving missingness. To identify general solutions to this problem, we review empirical studies and use simulation studies to compare some of the most frequently employed approaches to deal with data gaps, including subsampling, weighting and imputation. All these methods have the potential to reduce bias but may come at the cost of increased uncertainty of parameter estimates. Weighting techniques are arguably the least used so far in ecology and have the potential to reduce both the bias and variance of parameter estimates. Regardless of the method, the ability to reduce bias critically depends on knowledge of, and the availability of data on, the factors creating data gaps. We use this review to outline the necessary considerations when dealing with data gaps at different stages of the data collection and analysis workflow.</p>","PeriodicalId":133,"journal":{"name":"Biological Reviews","volume":" ","pages":""},"PeriodicalIF":11.0000,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biological Reviews","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1111/brv.13127","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Big biodiversity data sets have great potential for monitoring and research because of their large taxonomic, geographic and temporal scope. Such data sets have become especially important for assessing temporal changes in species' populations and distributions. Gaps in the available data, especially spatial and temporal gaps, often mean that the data are not representative of the target population. This hinders drawing large-scale inferences, such as about species' trends, and may lead to misplaced conservation action. Here, we conceptualise gaps in biodiversity monitoring data as a missing data problem, which provides a unifying framework for the challenges and potential solutions across different types of biodiversity data sets. We characterise the typical types of data gaps as different classes of missing data and then use missing data theory to explore the implications for questions about species' trends and factors affecting occurrences/abundances. By using this framework, we show that bias due to data gaps can arise when the factors affecting sampling and/or data availability overlap with those affecting species. But a data set per se is not biased. The outcome depends on the ecological question and statistical approach, which determine choices around which sources of variation are taken into account. We argue that typical approaches to long-term species trend modelling using monitoring data are especially susceptible to data gaps since such models do not tend to account for the factors driving missingness. To identify general solutions to this problem, we review empirical studies and use simulation studies to compare some of the most frequently employed approaches to deal with data gaps, including subsampling, weighting and imputation. All these methods have the potential to reduce bias but may come at the cost of increased uncertainty of parameter estimates. Weighting techniques are arguably the least used so far in ecology and have the potential to reduce both the bias and variance of parameter estimates. Regardless of the method, the ability to reduce bias critically depends on knowledge of, and the availability of data on, the factors creating data gaps. We use this review to outline the necessary considerations when dealing with data gaps at different stages of the data collection and analysis workflow.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
将生物多样性数据中的差距和偏差视为缺失数据问题。
生物多样性大数据集因其庞大的分类、地理和时间范围,在监测和研究方面具有巨大潜力。这些数据集对于评估物种种群和分布的时间变化尤为重要。可用数据的缺口,尤其是空间和时间缺口,往往意味着数据不能代表目标种群。这妨碍了大规模推断,如物种的发展趋势,并可能导致错误的保护行动。在此,我们将生物多样性监测数据的缺口概念化为数据缺失问题,为不同类型的生物多样性数据集所面临的挑战和潜在解决方案提供了一个统一的框架。我们将数据缺口的典型类型描述为不同类别的缺失数据,然后利用缺失数据理论来探讨有关物种趋势和影响出现/丰度因素的问题的影响。通过使用这一框架,我们表明,当影响采样和/或数据可用性的因素与影响物种的因素重叠时,就会出现数据缺失导致的偏差。但数据集本身并不存在偏差。结果取决于生态问题和统计方法,它们决定了对哪些变异来源进行考虑。我们认为,利用监测数据建立长期物种趋势模型的典型方法特别容易受到数据缺口的影响,因为这类模型往往不会考虑导致数据缺失的因素。为了找出这一问题的一般解决方案,我们回顾了实证研究,并利用模拟研究比较了一些最常用的处理数据缺口的方法,包括子采样、加权和估算。所有这些方法都有可能减少偏差,但可能会以增加参数估计的不确定性为代价。加权技术可以说是迄今为止生态学中使用最少的方法,但却有可能减少参数估计的偏差和方差。无论采用哪种方法,减少偏差的能力关键取决于对造成数据缺口的因素的了解和相关数据的可用性。我们通过这篇综述概述了在数据收集和分析工作流程的不同阶段处理数据缺口时的必要考虑因素。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Biological Reviews
Biological Reviews 生物-生物学
CiteScore
21.30
自引率
2.00%
发文量
99
审稿时长
6-12 weeks
期刊介绍: Biological Reviews is a scientific journal that covers a wide range of topics in the biological sciences. It publishes several review articles per issue, which are aimed at both non-specialist biologists and researchers in the field. The articles are scholarly and include extensive bibliographies. Authors are instructed to be aware of the diverse readership and write their articles accordingly. The reviews in Biological Reviews serve as comprehensive introductions to specific fields, presenting the current state of the art and highlighting gaps in knowledge. Each article can be up to 20,000 words long and includes an abstract, a thorough introduction, and a statement of conclusions. The journal focuses on publishing synthetic reviews, which are based on existing literature and address important biological questions. These reviews are interesting to a broad readership and are timely, often related to fast-moving fields or new discoveries. A key aspect of a synthetic review is that it goes beyond simply compiling information and instead analyzes the collected data to create a new theoretical or conceptual framework that can significantly impact the field. Biological Reviews is abstracted and indexed in various databases, including Abstracts on Hygiene & Communicable Diseases, Academic Search, AgBiotech News & Information, AgBiotechNet, AGRICOLA Database, GeoRef, Global Health, SCOPUS, Weed Abstracts, and Reaction Citation Index, among others.
期刊最新文献
Towards ecosystem-based techniques for tipping point detection. Bringing together but staying apart: decisive differences in animal and fungal mitochondrial inner membrane fusion. Testosterone mediates life-history trade-offs in female mammals. Zooming in the plastisphere: the ecological interface for phytoplankton-plastic interactions in aquatic ecosystems. Archaeocytes in sponges: simple cells of complicated fate.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1