公民科学数据误差量化的贝叶斯模型:在尼泊尔降雨观测中的应用

IF 5.7 1区地球科学 Q1 GEOSCIENCES, MULTIDISCIPLINARY Hydrology and Earth System Sciences Pub Date : 2023-10-09 DOI:10.5194/hess-27-3565-2023

Jessica A. Eisma, Gerrit Schoups, Jeffrey C. Davids, Nick van de Giesen

{"title":"公民科学数据误差量化的贝叶斯模型:在尼泊尔降雨观测中的应用","authors":"Jessica A. Eisma, Gerrit Schoups, Jeffrey C. Davids, Nick van de Giesen","doi":"10.5194/hess-27-3565-2023","DOIUrl":null,"url":null,"abstract":"Abstract. High-quality citizen science data can be instrumental in advancing science toward new discoveries and a deeper understanding of under-observed phenomena. However, the error structure of citizen scientist (CS) data must be well-defined. Within a citizen science program, the errors in submitted observations vary, and their occurrence may depend on CS-specific characteristics. This study develops a graphical Bayesian inference model of error types in CS data. The model assumes that (1) each CS observation is subject to a specific error type, each with its own bias and noise, and (2) an observation's error type depends on the static error community of the CS, which in turn relates to characteristics of the CS submitting the observation. Given a set of CS observations and corresponding ground-truth values, the model can be calibrated for a specific application, yielding (i) number of error types and error communities, (ii) bias and noise for each error type, (iii) error distribution of each error community, and (iv) the single error community to which each CS belongs. The model, applied to Nepal CS rainfall observations, identifies five error types and sorts CSs into four static, model-inferred communities. In the case study, 73 % of CSs submitted data with errors in fewer than 5 % of their observations. The remaining CSs submitted data with unit, meniscus, unknown, and outlier errors. A CS's assigned community, coupled with model-inferred error probabilities, can identify observations that require verification and provides an opportunity for targeted re-training of CSs based on mistake tendencies.","PeriodicalId":13143,"journal":{"name":"Hydrology and Earth System Sciences","volume":"1 1","pages":"0"},"PeriodicalIF":5.7000,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Bayesian model for quantifying errors in citizen science data: application to rainfall observations from Nepal\",\"authors\":\"Jessica A. Eisma, Gerrit Schoups, Jeffrey C. Davids, Nick van de Giesen\",\"doi\":\"10.5194/hess-27-3565-2023\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract. High-quality citizen science data can be instrumental in advancing science toward new discoveries and a deeper understanding of under-observed phenomena. However, the error structure of citizen scientist (CS) data must be well-defined. Within a citizen science program, the errors in submitted observations vary, and their occurrence may depend on CS-specific characteristics. This study develops a graphical Bayesian inference model of error types in CS data. The model assumes that (1) each CS observation is subject to a specific error type, each with its own bias and noise, and (2) an observation's error type depends on the static error community of the CS, which in turn relates to characteristics of the CS submitting the observation. Given a set of CS observations and corresponding ground-truth values, the model can be calibrated for a specific application, yielding (i) number of error types and error communities, (ii) bias and noise for each error type, (iii) error distribution of each error community, and (iv) the single error community to which each CS belongs. The model, applied to Nepal CS rainfall observations, identifies five error types and sorts CSs into four static, model-inferred communities. In the case study, 73 % of CSs submitted data with errors in fewer than 5 % of their observations. The remaining CSs submitted data with unit, meniscus, unknown, and outlier errors. A CS's assigned community, coupled with model-inferred error probabilities, can identify observations that require verification and provides an opportunity for targeted re-training of CSs based on mistake tendencies.\",\"PeriodicalId\":13143,\"journal\":{\"name\":\"Hydrology and Earth System Sciences\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":5.7000,\"publicationDate\":\"2023-10-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Hydrology and Earth System Sciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5194/hess-27-3565-2023\",\"RegionNum\":1,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"GEOSCIENCES, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Hydrology and Earth System Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5194/hess-27-3565-2023","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOSCIENCES, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

摘要

摘要高质量的公民科学数据有助于推动科学走向新的发现和对未被观察到的现象的更深入的理解。然而，公民科学家数据的错误结构必须有明确的定义。在公民科学项目中，提交的观察结果中的错误各不相同，它们的发生可能取决于cs的特定特征。本研究建立了CS数据误差类型的图形贝叶斯推理模型。该模型假设:(1)每个CS观测值都有特定的误差类型，每个观测值都有自己的偏差和噪声;(2)观测值的误差类型取决于CS的静态误差社区，而静态误差社区又与提交观测值的CS的特性有关。给定一组CS观测值和相应的真值，模型可以针对特定应用进行校准，产生(i)错误类型和错误社区的数量，(ii)每种错误类型的偏差和噪声，(iii)每个错误社区的错误分布，以及(iv)每个CS所属的单个错误社区。该模型应用于尼泊尔CS降雨观测，确定了5种错误类型，并将CS分为4个静态的、模型推断的群落。在案例研究中，73%的CSs提交的数据误差小于5%。其余的CSs提交的数据有单位、半月板、未知和异常值错误。CS的指定社区，加上模型推断的错误概率，可以识别需要验证的观察结果，并为基于错误倾向的CS提供有针对性的重新训练机会。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A Bayesian model for quantifying errors in citizen science data: application to rainfall observations from Nepal

Abstract. High-quality citizen science data can be instrumental in advancing science toward new discoveries and a deeper understanding of under-observed phenomena. However, the error structure of citizen scientist (CS) data must be well-defined. Within a citizen science program, the errors in submitted observations vary, and their occurrence may depend on CS-specific characteristics. This study develops a graphical Bayesian inference model of error types in CS data. The model assumes that (1) each CS observation is subject to a specific error type, each with its own bias and noise, and (2) an observation's error type depends on the static error community of the CS, which in turn relates to characteristics of the CS submitting the observation. Given a set of CS observations and corresponding ground-truth values, the model can be calibrated for a specific application, yielding (i) number of error types and error communities, (ii) bias and noise for each error type, (iii) error distribution of each error community, and (iv) the single error community to which each CS belongs. The model, applied to Nepal CS rainfall observations, identifies five error types and sorts CSs into four static, model-inferred communities. In the case study, 73 % of CSs submitted data with errors in fewer than 5 % of their observations. The remaining CSs submitted data with unit, meniscus, unknown, and outlier errors. A CS's assigned community, coupled with model-inferred error probabilities, can identify observations that require verification and provides an opportunity for targeted re-training of CSs based on mistake tendencies.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Hydrology and Earth System Sciences 地学-地球科学综合

CiteScore

10.10

自引率

7.90%

发文量

273

审稿时长

15 months

期刊介绍： Hydrology and Earth System Sciences (HESS) is a not-for-profit international two-stage open-access journal for the publication of original research in hydrology. HESS encourages and supports fundamental and applied research that advances the understanding of hydrological systems, their role in providing water for ecosystems and society, and the role of the water cycle in the functioning of the Earth system. A multi-disciplinary approach is encouraged that broadens the hydrological perspective and the advancement of hydrological science through integration with other cognate sciences and cross-fertilization across disciplinary boundaries.