A Primer on Topological Data Analysis to Support Image Analysis Tasks in Environmental Science

Artificial intelligence for the earth systems Pub Date : 2023-01-01 DOI:10.1175/aies-d-22-0039.1

Lander Ver Hoef, Henry Adams, Emily J. King, Imme Ebert-Uphoff

{"title":"A Primer on Topological Data Analysis to Support Image Analysis Tasks in Environmental Science","authors":"Lander Ver Hoef, Henry Adams, Emily J. King, Imme Ebert-Uphoff","doi":"10.1175/aies-d-22-0039.1","DOIUrl":null,"url":null,"abstract":"Abstract Topological data analysis (TDA) is a tool from data science and mathematics that is beginning to make waves in environmental science. In this work, we seek to provide an intuitive and understandable introduction to a tool from TDA that is particularly useful for the analysis of imagery, namely, persistent homology. We briefly discuss the theoretical background but focus primarily on understanding the output of this tool and discussing what information it can glean. To this end, we frame our discussion around a guiding example of classifying satellite images from the sugar, fish, flower, and gravel dataset produced for the study of mesoscale organization of clouds by Rasp et al. We demonstrate how persistent homology and its vectorization, persistence landscapes, can be used in a workflow with a simple machine learning algorithm to obtain good results, and we explore in detail how we can explain this behavior in terms of image-level features. One of the core strengths of persistent homology is how interpretable it can be, so throughout this paper we discuss not just the patterns we find but why those results are to be expected given what we know about the theory of persistent homology. Our goal is that readers of this paper will leave with a better understanding of TDA and persistent homology, will be able to identify problems and datasets of their own for which persistent homology could be helpful, and will gain an understanding of the results they obtain from applying the included GitHub example code. Significance Statement Information such as the geometric structure and texture of image data can greatly support the inference of the physical state of an observed Earth system, for example, in remote sensing to determine whether wildfires are active or to identify local climate zones. Persistent homology is a branch of topological data analysis that allows one to extract such information in an interpretable way—unlike black-box methods like deep neural networks. The purpose of this paper is to explain in an intuitive manner what persistent homology is and how researchers in environmental science can use it to create interpretable models. We demonstrate the approach to identify certain cloud patterns from satellite imagery and find that the resulting model is indeed interpretable.","PeriodicalId":94369,"journal":{"name":"Artificial intelligence for the earth systems","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial intelligence for the earth systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1175/aies-d-22-0039.1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Abstract Topological data analysis (TDA) is a tool from data science and mathematics that is beginning to make waves in environmental science. In this work, we seek to provide an intuitive and understandable introduction to a tool from TDA that is particularly useful for the analysis of imagery, namely, persistent homology. We briefly discuss the theoretical background but focus primarily on understanding the output of this tool and discussing what information it can glean. To this end, we frame our discussion around a guiding example of classifying satellite images from the sugar, fish, flower, and gravel dataset produced for the study of mesoscale organization of clouds by Rasp et al. We demonstrate how persistent homology and its vectorization, persistence landscapes, can be used in a workflow with a simple machine learning algorithm to obtain good results, and we explore in detail how we can explain this behavior in terms of image-level features. One of the core strengths of persistent homology is how interpretable it can be, so throughout this paper we discuss not just the patterns we find but why those results are to be expected given what we know about the theory of persistent homology. Our goal is that readers of this paper will leave with a better understanding of TDA and persistent homology, will be able to identify problems and datasets of their own for which persistent homology could be helpful, and will gain an understanding of the results they obtain from applying the included GitHub example code. Significance Statement Information such as the geometric structure and texture of image data can greatly support the inference of the physical state of an observed Earth system, for example, in remote sensing to determine whether wildfires are active or to identify local climate zones. Persistent homology is a branch of topological data analysis that allows one to extract such information in an interpretable way—unlike black-box methods like deep neural networks. The purpose of this paper is to explain in an intuitive manner what persistent homology is and how researchers in environmental science can use it to create interpretable models. We demonstrate the approach to identify certain cloud patterns from satellite imagery and find that the resulting model is indeed interpretable.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

支持环境科学中图像分析任务的拓扑数据分析入门

拓扑数据分析(TDA)是一种来自数据科学和数学的工具，它开始在环境科学中掀起波澜。在这项工作中，我们试图提供一个直观和可理解的介绍，从TDA的工具，是特别有用的分析图像，即持久同源。我们简要地讨论了理论背景，但主要集中在理解这个工具的输出和讨论它可以收集什么信息。为此，我们将围绕Rasp等人为研究云的中尺度组织而制作的糖、鱼、花和砾石数据集的卫星图像分类的指导性示例进行讨论。我们演示了如何在一个简单的机器学习算法的工作流中使用持久同构及其矢量化，持久景观，以获得良好的结果，我们详细探讨了如何在图像级特征方面解释这种行为。持久同调的核心优势之一是它的可解释性，因此在本文中，我们不仅讨论了我们发现的模式，还讨论了为什么我们知道关于持久同调理论的这些结果是可以预期的。我们的目标是，本文的读者将更好地理解TDA和持久同源性，将能够识别持久同源性可能有帮助的问题和数据集，并将了解他们从应用所包含的GitHub示例代码中获得的结果。图像数据的几何结构和纹理等信息可以极大地支持对被观测地球系统物理状态的推断，例如在遥感中确定野火是否活跃或确定当地气候带。持久同调是拓扑数据分析的一个分支，它允许人们以一种可解释的方式提取这些信息——不像深度神经网络这样的黑箱方法。本文的目的是以一种直观的方式解释什么是持久同源性，以及环境科学研究人员如何使用它来创建可解释的模型。我们演示了从卫星图像中识别某些云模式的方法，并发现所得模型确实是可解释的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Artificial intelligence for the earth systems

自引率

0.00%

发文量