Increasing the Reproducibility and Replicability of Supervised AI/ML in the Earth Systems Science by Leveraging Social Science Methods

IF 2.9 3区 地球科学 Q2 ASTRONOMY & ASTROPHYSICS Earth and Space Science Pub Date : 2024-07-04 DOI:10.1029/2023EA003364
Christopher D. Wirz, Carly Sutter, Julie L. Demuth, Kirsten J. Mayer, William E. Chapman, Mariana Goodall Cains, Jacob Radford, Vanessa Przybylo, Aaron Evans, Thomas Martin, Lauriana C. Gaudet, Kara Sulia, Ann Bostrom, David John Gagne II, Nick Bassill, Andrea Schumacher, Christopher Thorncroft
{"title":"Increasing the Reproducibility and Replicability of Supervised AI/ML in the Earth Systems Science by Leveraging Social Science Methods","authors":"Christopher D. Wirz,&nbsp;Carly Sutter,&nbsp;Julie L. Demuth,&nbsp;Kirsten J. Mayer,&nbsp;William E. Chapman,&nbsp;Mariana Goodall Cains,&nbsp;Jacob Radford,&nbsp;Vanessa Przybylo,&nbsp;Aaron Evans,&nbsp;Thomas Martin,&nbsp;Lauriana C. Gaudet,&nbsp;Kara Sulia,&nbsp;Ann Bostrom,&nbsp;David John Gagne II,&nbsp;Nick Bassill,&nbsp;Andrea Schumacher,&nbsp;Christopher Thorncroft","doi":"10.1029/2023EA003364","DOIUrl":null,"url":null,"abstract":"<p>Artificial intelligence (AI) and machine learning (ML) pose a challenge for achieving science that is both reproducible and replicable. The challenge is compounded in supervised models that depend on manually labeled training data, as they introduce additional decision-making and processes that require thorough documentation and reporting. We address these limitations by providing an approach to hand labeling training data for supervised ML that integrates quantitative content analysis (QCA)—a method from social science research. The QCA approach provides a rigorous and well-documented hand labeling procedure to improve the replicability and reproducibility of supervised ML applications in Earth systems science (ESS), as well as the ability to evaluate them. Specifically, the approach requires (a) the articulation and documentation of the exact decision-making process used for assigning hand labels in a “codebook” and (b) an empirical evaluation of the reliability” of the hand labelers. In this paper, we outline the contributions of QCA to the field, along with an overview of the general approach. We then provide a case study to further demonstrate how this framework has and can be applied when developing supervised ML models for applications in ESS. With this approach, we provide an actionable path forward for addressing ethical considerations and goals outlined by recent AGU work on ML ethics in ESS.</p>","PeriodicalId":54286,"journal":{"name":"Earth and Space Science","volume":null,"pages":null},"PeriodicalIF":2.9000,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1029/2023EA003364","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Earth and Space Science","FirstCategoryId":"89","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1029/2023EA003364","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ASTRONOMY & ASTROPHYSICS","Score":null,"Total":0}
引用次数: 0

Abstract

Artificial intelligence (AI) and machine learning (ML) pose a challenge for achieving science that is both reproducible and replicable. The challenge is compounded in supervised models that depend on manually labeled training data, as they introduce additional decision-making and processes that require thorough documentation and reporting. We address these limitations by providing an approach to hand labeling training data for supervised ML that integrates quantitative content analysis (QCA)—a method from social science research. The QCA approach provides a rigorous and well-documented hand labeling procedure to improve the replicability and reproducibility of supervised ML applications in Earth systems science (ESS), as well as the ability to evaluate them. Specifically, the approach requires (a) the articulation and documentation of the exact decision-making process used for assigning hand labels in a “codebook” and (b) an empirical evaluation of the reliability” of the hand labelers. In this paper, we outline the contributions of QCA to the field, along with an overview of the general approach. We then provide a case study to further demonstrate how this framework has and can be applied when developing supervised ML models for applications in ESS. With this approach, we provide an actionable path forward for addressing ethical considerations and goals outlined by recent AGU work on ML ethics in ESS.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用社会科学方法,提高地球系统科学中人工智能/数学模型监督的可重复性和可复制性
人工智能(AI)和机器学习(ML)对实现可重现和可复制的科学提出了挑战。在依赖人工标注训练数据的监督模型中,这一挑战更为严峻,因为它们引入了额外的决策和流程,需要全面的记录和报告。为了解决这些局限性,我们提供了一种为有监督人工智能手动标注训练数据的方法,该方法整合了定量内容分析(QCA)--一种来自社会科学研究的方法。定量内容分析方法提供了一种严格的、有据可查的手工标注程序,可提高地球系统科学(ESS)中有监督 ML 应用的可复制性和可重复性,以及对其进行评估的能力。具体来说,该方法需要:(a)阐明并记录用于在 "代码簿 "中分配手工标签的确切决策过程;(b)对手工标签制作者的 "可靠性 "进行实证评估。在本文中,我们概述了 QCA 对该领域的贡献,并概述了一般方法。然后,我们提供了一个案例研究,进一步展示了在为 ESS 应用开发有监督 ML 模型时,如何应用这一框架。通过这种方法,我们提供了一条可操作的前进道路,以解决伦理方面的问题,并实现 AGU 最近关于 ESS 中 ML 伦理的工作所提出的目标。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Earth and Space Science
Earth and Space Science Earth and Planetary Sciences-General Earth and Planetary Sciences
CiteScore
5.50
自引率
3.20%
发文量
285
审稿时长
19 weeks
期刊介绍: Marking AGU’s second new open access journal in the last 12 months, Earth and Space Science is the only journal that reflects the expansive range of science represented by AGU’s 62,000 members, including all of the Earth, planetary, and space sciences, and related fields in environmental science, geoengineering, space engineering, and biogeochemistry.
期刊最新文献
Geochemistry by Laser-Induced Breakdown Spectroscopy on the Moon: Accuracy, Detection Limits, and Realistic Constraints on Interpretations A Deep Learning Approach for Automatic Ionogram Parameters Recognition With Convolutional Neural Networks Issue Information Synergistic Utilization of Spaceborne SAR Observations for Monitoring the Baltic Sea Flow Through the Danish Straits Generalized Time-Series Analysis for In Situ Spacecraft Observations: Anomaly Detection and Data Prioritization Using Principal Components Analysis and Unsupervised Clustering
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1