增加对新数据源的信任:生态众包图像分类

IF 1.7 3区 数学 Q1 STATISTICS & PROBABILITY International Statistical Review Pub Date : 2023-05-21 DOI:10.1111/insr.12542
Edgar Santos-Fernandez, Julie Vercelloni, Aiden Price, Grace Heron, Bryce Christensen, Erin E. Peterson, Kerrie Mengersen
{"title":"增加对新数据源的信任:生态众包图像分类","authors":"Edgar Santos-Fernandez,&nbsp;Julie Vercelloni,&nbsp;Aiden Price,&nbsp;Grace Heron,&nbsp;Bryce Christensen,&nbsp;Erin E. Peterson,&nbsp;Kerrie Mengersen","doi":"10.1111/insr.12542","DOIUrl":null,"url":null,"abstract":"<p>Crowdsourcing methods facilitate the production of scientific information by non-experts. This form of citizen science (CS) is becoming a key source of complementary data in many fields to inform data-driven decisions and study challenging problems. However, concerns about the validity of these data often constrain their utility. In this paper, we focus on the use of citizen science data in addressing complex challenges in environmental conservation. We consider this issue from three perspectives. First, we present a literature scan of papers that have employed Bayesian models with citizen science in ecology. Second, we compare several popular majority vote algorithms and introduce a Bayesian item response model that estimates and accounts for participants' abilities after adjusting for the difficulty of the images they have classified. The model also enables participants to be clustered into groups based on ability. Third, we apply the model in a case study involving the classification of corals from underwater images from the Great Barrier Reef, Australia. We show that the model achieved superior results in general and, for difficult tasks, a weighted consensus method that uses only groups of experts and experienced participants produced better performance measures. Moreover, we found that participants learn as they have more classification opportunities, which substantially increases their abilities over time. Overall, the paper demonstrates the feasibility of CS for answering complex and challenging ecological questions when these data are appropriately analysed. This serves as motivation for future work to increase the efficacy and trustworthiness of this emerging source of data.</p>","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":null,"pages":null},"PeriodicalIF":1.7000,"publicationDate":"2023-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/insr.12542","citationCount":"0","resultStr":"{\"title\":\"Increasing Trust in New Data Sources: Crowdsourcing Image Classification for Ecology\",\"authors\":\"Edgar Santos-Fernandez,&nbsp;Julie Vercelloni,&nbsp;Aiden Price,&nbsp;Grace Heron,&nbsp;Bryce Christensen,&nbsp;Erin E. Peterson,&nbsp;Kerrie Mengersen\",\"doi\":\"10.1111/insr.12542\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Crowdsourcing methods facilitate the production of scientific information by non-experts. This form of citizen science (CS) is becoming a key source of complementary data in many fields to inform data-driven decisions and study challenging problems. However, concerns about the validity of these data often constrain their utility. In this paper, we focus on the use of citizen science data in addressing complex challenges in environmental conservation. We consider this issue from three perspectives. First, we present a literature scan of papers that have employed Bayesian models with citizen science in ecology. Second, we compare several popular majority vote algorithms and introduce a Bayesian item response model that estimates and accounts for participants' abilities after adjusting for the difficulty of the images they have classified. The model also enables participants to be clustered into groups based on ability. Third, we apply the model in a case study involving the classification of corals from underwater images from the Great Barrier Reef, Australia. We show that the model achieved superior results in general and, for difficult tasks, a weighted consensus method that uses only groups of experts and experienced participants produced better performance measures. Moreover, we found that participants learn as they have more classification opportunities, which substantially increases their abilities over time. Overall, the paper demonstrates the feasibility of CS for answering complex and challenging ecological questions when these data are appropriately analysed. This serves as motivation for future work to increase the efficacy and trustworthiness of this emerging source of data.</p>\",\"PeriodicalId\":14479,\"journal\":{\"name\":\"International Statistical Review\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2023-05-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1111/insr.12542\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Statistical Review\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/insr.12542\",\"RegionNum\":3,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Statistical Review","FirstCategoryId":"100","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/insr.12542","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0

摘要

众包方法促进了非专家生产科学信息。这种形式的公民科学(CS)正在成为许多领域补充数据的关键来源,为数据驱动的决策提供信息,并研究具有挑战性的问题。然而,对这些数据有效性的担忧往往限制了它们的效用。在本文中,我们着重于利用公民科学数据来解决环境保护中的复杂挑战。我们从三个角度考虑这个问题。首先,我们提出了文献扫描的论文,已采用贝叶斯模型与公民科学在生态学。其次,我们比较了几种流行的多数投票算法,并引入了一个贝叶斯项目反应模型,该模型在调整了参与者分类图像的难度后,估计和解释了参与者的能力。该模型还允许参与者根据能力分组。第三,我们将该模型应用于一个案例研究中,该案例涉及澳大利亚大堡礁水下图像中的珊瑚分类。我们表明,该模型在一般情况下取得了优异的结果,对于困难的任务,仅使用专家组和经验丰富的参与者的加权共识方法产生了更好的绩效指标。此外,我们发现,参与者学习,因为他们有更多的分类机会,这大大提高了他们的能力随着时间的推移。总的来说,本文证明了当这些数据得到适当分析时,CS回答复杂和具有挑战性的生态问题的可行性。这是未来工作的动力,以提高这一新兴数据来源的有效性和可信度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Increasing Trust in New Data Sources: Crowdsourcing Image Classification for Ecology

Crowdsourcing methods facilitate the production of scientific information by non-experts. This form of citizen science (CS) is becoming a key source of complementary data in many fields to inform data-driven decisions and study challenging problems. However, concerns about the validity of these data often constrain their utility. In this paper, we focus on the use of citizen science data in addressing complex challenges in environmental conservation. We consider this issue from three perspectives. First, we present a literature scan of papers that have employed Bayesian models with citizen science in ecology. Second, we compare several popular majority vote algorithms and introduce a Bayesian item response model that estimates and accounts for participants' abilities after adjusting for the difficulty of the images they have classified. The model also enables participants to be clustered into groups based on ability. Third, we apply the model in a case study involving the classification of corals from underwater images from the Great Barrier Reef, Australia. We show that the model achieved superior results in general and, for difficult tasks, a weighted consensus method that uses only groups of experts and experienced participants produced better performance measures. Moreover, we found that participants learn as they have more classification opportunities, which substantially increases their abilities over time. Overall, the paper demonstrates the feasibility of CS for answering complex and challenging ecological questions when these data are appropriately analysed. This serves as motivation for future work to increase the efficacy and trustworthiness of this emerging source of data.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
International Statistical Review
International Statistical Review 数学-统计学与概率论
CiteScore
4.30
自引率
5.00%
发文量
52
审稿时长
>12 weeks
期刊介绍: International Statistical Review is the flagship journal of the International Statistical Institute (ISI) and of its family of Associations. It publishes papers of broad and general interest in statistics and probability. The term Review is to be interpreted broadly. The types of papers that are suitable for publication include (but are not limited to) the following: reviews/surveys of significant developments in theory, methodology, statistical computing and graphics, statistical education, and application areas; tutorials on important topics; expository papers on emerging areas of research or application; papers describing new developments and/or challenges in relevant areas; papers addressing foundational issues; papers on the history of statistics and probability; white papers on topics of importance to the profession or society; and historical assessment of seminal papers in the field and their impact.
期刊最新文献
Issue Information Statistics: Multivariate Data Integration Using R; Methods and Applications With the mixOmics Package Kim-Anh Lê Cao, Zoe Marie WelhamChapman & Hall/CRC, 2021, xxi + 308 pages, £84.99/$115.00, hardcover ISBN: 978-1032128078 eBook ISBN: 9781003026860 Philosophies, Puzzles, and Paradoxes: A Statistician's Search for the Truth Yudi Pawitan and Youngjo LeeChapman & Hall/CRC, 2024, xiv + 351 pages, £18.39/$23.96 paperback, £104/$136 hardback, £17.24/$22.46 eBook ISBN: 9781032377391 paperback; 9781032377407 hardback; 9781003341659 ebook Machine Learning Theory and Applications: Hands-On Use Cases With Python on Classical and Quantum Machines, Xavier Vasques, John Wiley & Sons, 2024, xx + 487 pages, $89.95, hardcover ISBN: 978-1-394-22061-8 Object Oriented Data Analysis J. S. Marron and I. L. DrydenChapman & Hall/CRC, 2022, xii + 424 pages, softcover ISBN: 978-0-8153-9282-8 (hbk) ISBN: 978-1-032-11480-4 (pbk) ISBN: 978-1-351-18967-5 (ebk)
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1