A Bayesian model for quantifying errors in citizen science data: application to rainfall observations from Nepal

IF 5.7 1区 地球科学 Q1 GEOSCIENCES, MULTIDISCIPLINARY Hydrology and Earth System Sciences Pub Date : 2023-10-09 DOI:10.5194/hess-27-3565-2023
Jessica A. Eisma, Gerrit Schoups, Jeffrey C. Davids, Nick van de Giesen
{"title":"A Bayesian model for quantifying errors in citizen science data: application to rainfall observations from Nepal","authors":"Jessica A. Eisma, Gerrit Schoups, Jeffrey C. Davids, Nick van de Giesen","doi":"10.5194/hess-27-3565-2023","DOIUrl":null,"url":null,"abstract":"Abstract. High-quality citizen science data can be instrumental in advancing science toward new discoveries and a deeper understanding of under-observed phenomena. However, the error structure of citizen scientist (CS) data must be well-defined. Within a citizen science program, the errors in submitted observations vary, and their occurrence may depend on CS-specific characteristics. This study develops a graphical Bayesian inference model of error types in CS data. The model assumes that (1) each CS observation is subject to a specific error type, each with its own bias and noise, and (2) an observation's error type depends on the static error community of the CS, which in turn relates to characteristics of the CS submitting the observation. Given a set of CS observations and corresponding ground-truth values, the model can be calibrated for a specific application, yielding (i) number of error types and error communities, (ii) bias and noise for each error type, (iii) error distribution of each error community, and (iv) the single error community to which each CS belongs. The model, applied to Nepal CS rainfall observations, identifies five error types and sorts CSs into four static, model-inferred communities. In the case study, 73 % of CSs submitted data with errors in fewer than 5 % of their observations. The remaining CSs submitted data with unit, meniscus, unknown, and outlier errors. A CS's assigned community, coupled with model-inferred error probabilities, can identify observations that require verification and provides an opportunity for targeted re-training of CSs based on mistake tendencies.","PeriodicalId":13143,"journal":{"name":"Hydrology and Earth System Sciences","volume":"1 1","pages":"0"},"PeriodicalIF":5.7000,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Hydrology and Earth System Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5194/hess-27-3565-2023","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOSCIENCES, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Abstract. High-quality citizen science data can be instrumental in advancing science toward new discoveries and a deeper understanding of under-observed phenomena. However, the error structure of citizen scientist (CS) data must be well-defined. Within a citizen science program, the errors in submitted observations vary, and their occurrence may depend on CS-specific characteristics. This study develops a graphical Bayesian inference model of error types in CS data. The model assumes that (1) each CS observation is subject to a specific error type, each with its own bias and noise, and (2) an observation's error type depends on the static error community of the CS, which in turn relates to characteristics of the CS submitting the observation. Given a set of CS observations and corresponding ground-truth values, the model can be calibrated for a specific application, yielding (i) number of error types and error communities, (ii) bias and noise for each error type, (iii) error distribution of each error community, and (iv) the single error community to which each CS belongs. The model, applied to Nepal CS rainfall observations, identifies five error types and sorts CSs into four static, model-inferred communities. In the case study, 73 % of CSs submitted data with errors in fewer than 5 % of their observations. The remaining CSs submitted data with unit, meniscus, unknown, and outlier errors. A CS's assigned community, coupled with model-inferred error probabilities, can identify observations that require verification and provides an opportunity for targeted re-training of CSs based on mistake tendencies.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
公民科学数据误差量化的贝叶斯模型:在尼泊尔降雨观测中的应用
摘要高质量的公民科学数据有助于推动科学走向新的发现和对未被观察到的现象的更深入的理解。然而,公民科学家数据的错误结构必须有明确的定义。在公民科学项目中,提交的观察结果中的错误各不相同,它们的发生可能取决于cs的特定特征。本研究建立了CS数据误差类型的图形贝叶斯推理模型。该模型假设:(1)每个CS观测值都有特定的误差类型,每个观测值都有自己的偏差和噪声;(2)观测值的误差类型取决于CS的静态误差社区,而静态误差社区又与提交观测值的CS的特性有关。给定一组CS观测值和相应的真值,模型可以针对特定应用进行校准,产生(i)错误类型和错误社区的数量,(ii)每种错误类型的偏差和噪声,(iii)每个错误社区的错误分布,以及(iv)每个CS所属的单个错误社区。该模型应用于尼泊尔CS降雨观测,确定了5种错误类型,并将CS分为4个静态的、模型推断的群落。在案例研究中,73%的CSs提交的数据误差小于5%。其余的CSs提交的数据有单位、半月板、未知和异常值错误。CS的指定社区,加上模型推断的错误概率,可以识别需要验证的观察结果,并为基于错误倾向的CS提供有针对性的重新训练机会。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Hydrology and Earth System Sciences
Hydrology and Earth System Sciences 地学-地球科学综合
CiteScore
10.10
自引率
7.90%
发文量
273
审稿时长
15 months
期刊介绍: Hydrology and Earth System Sciences (HESS) is a not-for-profit international two-stage open-access journal for the publication of original research in hydrology. HESS encourages and supports fundamental and applied research that advances the understanding of hydrological systems, their role in providing water for ecosystems and society, and the role of the water cycle in the functioning of the Earth system. A multi-disciplinary approach is encouraged that broadens the hydrological perspective and the advancement of hydrological science through integration with other cognate sciences and cross-fertilization across disciplinary boundaries.
期刊最新文献
Exploring the joint probability of precipitation and soil moisture over Europe using copulas Past, present and future rainfall erosivity in central Europe based on convection-permitting climate simulations A framework for parameter estimation, sensitivity analysis, and uncertainty analysis for holistic hydrologic modeling using SWAT+ Spatio-temporal information propagation using sparse observations in hyper-resolution ensemble-based snow data assimilation On the optimal level of complexity for the representation of groundwater-dependent wetland systems in land surface models
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1